<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>小生这厢有礼了(BioFaceBook Personal Blog) &#187; R</title>
	<atom:link href="https://www.biofacebook.com/?feed=rss2&#038;tag=r-2" rel="self" type="application/rss+xml" />
	<link>https://www.biofacebook.com</link>
	<description>记录生物信息学点滴足迹（NGS,Genome,Meta,Linux)</description>
	<lastBuildDate>Sun, 23 Aug 2020 03:28:53 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.41</generator>
	<item>
		<title>R 列表随意组合 data.frame(x,y)</title>
		<link>https://www.biofacebook.com/?p=715</link>
		<comments>https://www.biofacebook.com/?p=715#comments</comments>
		<pubDate>Thu, 24 Jan 2013 05:02:13 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[R语言]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=715</guid>
		<description><![CDATA[&#62; protein_data_B30_min&#60;-protein_data[1:2548,10:12] &#62; protein_data_M30_min&#60;-protein_data[1:2548,19:21] &#62; protein_data_30_min&#60;-data.frame(protein_data_B30_min,protein_data_M30_min) &#62; protein_data_30_min[1:2,] B30 B30.1 B30.2 Mgo30 Mgo30.1 Mgo30.2 1 870.5042 867.0873 0 1086.828 1481.228 2726.929 2 5167.6455 4646.3450 0 4409.320 3017.866 3216.642 [...]]]></description>
				<content:encoded><![CDATA[<pre><span style="color: #0000ff;"><span style="font-family: Monospace,永中宋体;"><span style="font-size: small;">&gt; protein_data_B30_min&lt;-protein_data[1:2548,10:12]</span></span></span>
<span style="color: #0000ff;"><span style="font-family: Monospace,永中宋体;"><span style="font-size: small;">&gt; protein_data_M30_min&lt;-protein_data[1:2548,19:21]</span></span></span>
<span style="color: #0000ff;"><span style="font-family: Monospace,永中宋体;"><span style="font-size: small;">&gt; protein_data_30_min&lt;-data.frame(protein_data_B30_min,protein_data_M30_min)</span></span></span>
<span style="color: #0000ff;"><span style="font-family: Monospace,永中宋体;"><span style="font-size: small;">&gt; protein_data_30_min[1:2,]</span></span></span>
<span style="font-family: DejaVu Sans Mono;"><span style="color: #000000;"><span style="font-family: Monospace,永中宋体;"><span style="font-size: small;">B30 B30.1 B30.2 Mgo30 Mgo30.1 Mgo30.2</span></span></span></span>
<span style="color: #000000;"><span style="font-family: Monospace,永中宋体;"><span style="font-size: small;">1 870.5042 867.0873 0 1086.828 1481.228 2726.929</span></span></span>
<span style="color: #000000;"><span style="font-family: Monospace,永中宋体;"><span style="font-size: small;">2 5167.6455 4646.3450 0 4409.320 3017.866 3216.642</span></span></span></pre>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=715</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R画 维恩图</title>
		<link>https://www.biofacebook.com/?p=682</link>
		<comments>https://www.biofacebook.com/?p=682#comments</comments>
		<pubDate>Tue, 15 Jan 2013 07:25:18 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[R语言]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=682</guid>
		<description><![CDATA[<p>&#160;</p> &#62; install.packages('plotrix') Installing package(s) into ‘/home/shenzy/R/x86_64-pc-linux-gnu-library/2.15’ (as ‘lib’ is unspecified) 试开URL’http://cran.csiro.au/src/contrib/plotrix_3.4-5.tar.gz' Content type 'application/x-gzip' length 211113 bytes (206 Kb) 打开了URL ================================================== downloaded 206 Kb * installing *source* package ‘plotrix’ ... ** 成功将‘plotrix’程序包解包并MD5和检查 ** R ** data ** demo ** inst ** preparing package for lazy loading ** help *** installing help indices ** building [...]]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<pre>&gt; install.packages('plotrix') 
Installing package(s) into ‘/home/shenzy/R/x86_64-pc-linux-gnu-library/2.15’
(as ‘lib’ is unspecified)
试开URL’http://cran.csiro.au/src/contrib/plotrix_3.4-5.tar.gz'
Content type 'application/x-gzip' length 211113 bytes (206 Kb)
打开了URL
==================================================
downloaded 206 Kb

* installing *source* package ‘plotrix’ ...
** 成功将‘plotrix’程序包解包并MD5和检查
** R
** data
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded

* DONE (plotrix)

The downloaded source packages are in
	‘/tmp/RtmpQVXKKd/downloaded_packages’</pre>
<p>library(plotrix)<br />
plot(0:10,seq(0,10,length=11),type=&#8217;n&#8217;,axes=F,xlab=&#8221;,ylab=&#8221;)<br />
draw.circle(2,5,2,col=rgb(154/255,0/255,205/255,0.6))<br />
draw.circle(4,5,2,col=rgb(21/255,3/255,252/255,0.6)) text(1,5,labels=&#8217;10.12%&#8217;,col=&#8217;white&#8217;,font=2) text(5,5,labels=&#8217;40.38%&#8217;,col=&#8217;white&#8217;,font=2)<br />
text(3,5,labels=&#8217;49.5%&#8217;,col=&#8217;white&#8217;,font=2)<br />
legend(6.2,5,pch=15, xjust=0,yjust=0.5,bty=&#8217;n&#8217;,cex=1.3,col=c(rgb(154/255,0/255,205/255),rgb(74/255,2/255,233/255),rgb(21/255,3/255,252/255) ), legend=c(&#8216;sample 1 uniq&#8217;,&#8217;sample&amp; sample 2&#8242;,&#8217;sample 2 uniq&#8217;))<br />
text(3.5,7.5,labels=&#8217;Venn chart for uniq_sRNAs&#8217;,font=2,cex=1.5)</p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2013/01/Rplot.jpeg"><img class="aligncenter size-full wp-image-683" title="Rplot" src="http://www.biofacebook.com/wp-content/uploads/2013/01/Rplot.jpeg" alt="" width="850" height="500" /></a></p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=682</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using prcomp/princomp for PCA in R （三）</title>
		<link>https://www.biofacebook.com/?p=603</link>
		<comments>https://www.biofacebook.com/?p=603#comments</comments>
		<pubDate>Fri, 31 Aug 2012 04:43:58 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[R语言]]></category>
		<category><![CDATA[统计学习]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=603</guid>
		<description><![CDATA[ Testing i.pca ~ prcomp(), m.eigensystem ~ princomp() <p>1. Briefly about PCA 2. The modules/functions that implement PCA in GRASS &#38; R 3. My claims (Entitled Comments) 4. Evidence (=the numbers derived from i.pca, prcomp, princomp, m.eigensystem using some MODIS surface reflectance bands).</p> <p>Finally all is clear _but_ one thing: the only &#8220;unknown&#8221; variable (to [...]]]></description>
				<content:encoded><![CDATA[<div id="topic-search-box">
<form id="search-input" name="query"><input type="hidden" name="macro" value="search_page" /> <input type="hidden" name="node" value="3681316" /> </form>
</div>
<h1 id="post-title">Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()</h1>
<p>1. Briefly about PCA<br />
2. The modules/functions that implement PCA in GRASS &amp; R<br />
3. My claims (Entitled Comments)<br />
4. Evidence (=the numbers derived from i.pca, prcomp, princomp,<br />
m.eigensystem using some MODIS surface reflectance bands).</p>
<p>Finally all is clear _but_ one thing: the only &#8220;unknown&#8221; variable (to<br />
me) is still the Eigenvalue provided by i.pca. I can&#8217;t nail this one:<br />
*** how is it calculated? *** Looks like it is some _weighted_<br />
variance&#8230; !??</p>
<p>The only thing I noticed with some testing (not thoroughly though) is<br />
that the Eigenvalue (=Variance) reported by &#8220;i.pca&#8221; is ~0.58 of the<br />
variance reported by &#8220;prcomp()&#8221; (sdev^2) !? That is:</p>
<p>sqrt(Eigenvalue(i.pca)) / sdev(prcomp) = 0.58xxx</p>
<p>Hamish, if you have the time (or anyone , could you please &#8220;translate&#8221;<br />
the code in &#8220;grass6_dev/imagery/i.pca/main.c&#8221;, concerning the &#8220;eigval&#8221;<br />
variable, in some pseudocode. It is really the last thing to clarify ( I<br />
think ).</p>
<p>Kindest regards, Nikos</p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
# 1. PCA ### ### ### ###</p>
<p>*. PCA is performed via:</p>
<p>&#8211; method 1 (eigenvectors) : by determining the eigenvalues and<br />
eigenvectors of a given covariance or correlation matrix.</p>
<p>&#8211; covariance matrix -&gt; non-standardised<br />
&#8211; correlation matrix -&gt; standardised</p>
<p>&#8211; method 2 (SVD): based on by a singular value decomposition of the<br />
data matrix. This is a more general solution to PCA than the<br />
&#8220;eigenvectors&#8221;.</p>
<p>&#8230;The difference between the two methods is explained in [1]&#8230;<br />
&#8230;SVD is used for numerical accuracy&#8230; (R Documentation)<br />
&#8230;Results between the two methos can differ a bit&#8230;</p>
<p>&#8230;Data centering reduces the square mean error of<br />
*approximating* the input data<br />
&#8230;Data scaling</p>
<p>*. PCA returns:</p>
<p>&#8211; eigenvalues (=variance _or_ sdev^2)<br />
&#8211; eigenvectors (=loadings, rotation, weighting coefficient)</p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
# 2. Implementations of PCA ### ### ### ###</p>
<p>* R&#8217;s implementation of:</p>
<p>&#8211; method 1 (eigen) is the _princomp()_ function</p>
<p>-*applies*:<br />
\data centering<br />
\data scaling</p>
<p>-returns:<br />
\sdev  &#8211;  standard deviation of principal components (that is:<br />
the sqrt(Variance)<br />
\loadings  &#8211;  eigenvectors</p>
<p>&#8211; method 2 (SVD) is the _prcomp()_ function</p>
<p>-options:<br />
\data centering (center= TRUE | FALSE) [default is TRUE]<br />
\data scaling (scale=TRUE | FALSE) [default is FALSE]</p>
<p>-returns:<br />
\sdev (as above)<br />
\rotation (same as loadings)</p>
<p>* GRASS&#8217; implementation of:</p>
<p>&#8211; method 1 (eigen) is the _m.eigensystem_ module</p>
<p>-returns:<br />
E -&gt; eigenvalue -&gt; Variance(E) -&gt; sdev(E)^2<br />
V -&gt; eigenvectors associated with E<br />
N -&gt; normalized eigenvectors V &#8211;&gt; This is _data centering_.<br />
W -&gt; N vector multiplied by the square root of the magnitude of<br />
the eigenvalue (E), in other words: W = N * sdev(E)</p>
<p>&#8211; method 2 (SVD) is the   _i.pca_ module</p>
<p>-returns:<br />
\Eigenvalues (latest version by Hamish)<br />
\Eigenvectors</p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;<br />
# 3. Comments ### ### ### ###</p>
<p># 1: it is (?) meaningless to compare the numerical results<br />
(eigenvectors) of &#8220;i.pca&#8221; with the results of &#8220;m.eigensystem&#8221; since they<br />
implement two different algebraic solutions to PCA with different<br />
options (data centering is NOT performed by i.pca). The different<br />
algebraic solutions don&#8217;t give big differences. Data centering does.</p>
<p># 2: &#8220;i.pca&#8221; corresponds to &#8220;prcomp( ,center=FALSE,scale=FALSE)&#8221; and<br />
&#8220;m.eigensystem&#8221; corresponds to &#8220;princomp()&#8221;</p>
<p># 3: The standard deviations (sdev) reported by prcomp(,center=TRUE,<br />
scale=FALSE) _match_ the variances (=eigenvalues = sdev^2) reported by<br />
&#8220;m.eigensystem&#8221;.</p>
<p># 4: Obviously, &#8220;prcomp()&#8221; with center=TRUE and scaling=FALSE by default<br />
gives (almost) same results as &#8220;princomp()&#8221;.</p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
# 4. Comparing PCA results ### ### ### ###</p>
<p># prcomp() #############################################################</p>
<p>### Details for prcomp {stats} from R Documentation<br />
The calculation is done by a singular value decomposition of the<br />
(centered and possibly scaled) data matrix, not by using eigen on the<br />
covariance matrix. This is generally the preferred method for numerical<br />
accuracy.<br />
## more details in [2]<br />
#Default settings of prcomp(): _center=TRUE_ and _scale=FALSE_</p>
<p>### with center=FALSE, scale=FALSE<br />
prcomp(mod07, center=FALSE, scale=FALSE) &lt;&lt;== this corresponds to i.pca</p>
<p>Standard deviations:<br />
[1] 4288.3788  476.8904  114.3971</p>
<p>Rotation:<br />
PC1        PC2        PC3<br />
MOD2007_242_500_sur_refl_b02 -0.6353238  0.7124070 -0.2980602<br />
MOD2007_242_500_sur_refl_b06 -0.6485551 -0.2826985  0.7067234<br />
MOD2007_242_500_sur_refl_b07 -0.4192135 -0.6423066 -0.6416403</p>
<p>### with center=TRUE, scale=FALSE<br />
prcomp(mod07, center=TRUE, scale=FALSE)<br />
Standard deviations:<br />
[1] 857.5749 436.0928 108.5085</p>
<p>Rotation:<br />
PC1         PC2        PC3<br />
MOD2007_242_500_sur_refl_b02 0.4184468  0.83881578 -0.3482677<br />
MOD2007_242_500_sur_refl_b06 0.7249935 -0.07751872  0.6843794<br />
MOD2007_242_500_sur_refl_b07 0.5470710 -0.53886820 -0.6405735</p>
<p>### with center=TRUE, scale=TRUE<br />
prcomp(mod07, center=TRUE, scale=TRUE)<br />
Standard deviations:<br />
[1] 1.5030740 0.8397807 0.1885121</p>
<p>Rotation:<br />
PC1        PC2        PC3<br />
MOD2007_242_500_sur_refl_b02 0.4807060  0.8202848 -0.3099267<br />
MOD2007_242_500_sur_refl_b06 0.6561893 -0.1020540  0.7476634<br />
MOD2007_242_500_sur_refl_b07 0.5816677 -0.5627768 -0.5873201</p>
<p>### with center=FALSE, scale=TRUE<br />
prcomp(mod07, center=FALSE, scale=TRUE)<br />
Standard deviations:<br />
[1] 1.71889117 0.20787871 0.04689961</p>
<p>Rotation:<br />
PC1         PC2        PC3<br />
MOD2007_242_500_sur_refl_b02 -0.5747640  0.74015136 -0.3490305<br />
MOD2007_242_500_sur_refl_b06 -0.5812896 -0.06907305  0.8107597<br />
MOD2007_242_500_sur_refl_b07 -0.5759763 -0.66888331 -0.4699430</p>
<p># i.pca ##############################################################</p>
<p># perform pca<br />
i.pca input=MOD07_b02,MOD07_b06,MOD07_b07 output=TEST</p>
<p>Eigen values, (vectors), and [percent importance]:<br />
PC1 6307563.04 ( -0.64 -0.65 -0.42 ) [ 98.71% ]<br />
PC2 78023.63 ( -0.71 0.28 0.64 ) [ 1.22% ]<br />
PC3 4504.60 ( -0.30 0.71 -0.64 ) [ 0.07% ]</p>
<p>### Comment: Comparing with prcomp()&#8217;s results it is obvious that<br />
&#8220;i.pca&#8221; implements the SVD method _without_ centering and _without_<br />
scaling the data prior to the analysis. ###</p>
<p># princomp() &#8211; cov #####################################################</p>
<p># Details for princomp {stats} from R Documentation<br />
The calculation is done using eigen on the correlation or covariance<br />
matrix</p>
<p>### using _covariance_ matrix<br />
# prints only standard deviations<br />
princomp(mod07)</p>
<p>Call:<br />
princomp(x = mod07)</p>
<p>Standard deviations:<br />
Comp.1   Comp.2   Comp.3<br />
857.5737 436.0922 108.5083</p>
<p>3  variables and  350596 observations.</p>
<p># get loadings (=rotation = eigenvectors)<br />
(princomp(mod07))$loadings</p>
<p>Loadings:<br />
Comp.1 Comp.2 Comp.3<br />
MOD2007_242_500_sur_refl_b02 -0.418  0.839  0.348<br />
MOD2007_242_500_sur_refl_b06 -0.725        -0.684<br />
MOD2007_242_500_sur_refl_b07 -0.547 -0.539  0.641</p>
<p>Comp.1 Comp.2 Comp.3<br />
SS loadings     1.000  1.000  1.000<br />
Proportion Var  0.333  0.333  0.333<br />
Cumulative Var  0.333  0.667  1.000</p>
<p># m.eigensystem &#8211; cov #################################################</p>
<p># Determine eigen values and vectors of _covariance_ matrix<br />
## Marked with &#8220;#&#8221; are the _N_&#8217;s that correspond to the eigenvectors<br />
derived from R&#8217;s princomp() ##<br />
(echo 3; r.covar MOD07_b02,MOD07_b06,MOD07_b07)|m.eigensystem</p>
<p>r.covar: complete &#8230;<br />
100%<br />
E    778244.0258462029          .0000000000    79.20<br />
V          .5006581842          .0000000000<br />
V          .8256483300          .0000000000<br />
V          .6155834548          .0000000000<br />
N      #   .4372107421   #      .0000000000<br />
N      #   .7210155161   #      .0000000000<br />
N      #   .5375717557   #      .0000000000<br />
W       385.6991853500          .0000000000<br />
W       636.0664787886          .0000000000<br />
W       474.2358050886          .0000000000<br />
E    192494.5769628266          .0000000000    19.59<br />
V         -.8689798010          .0000000000<br />
V          .0996340298          .0000000000<br />
V          .5731134848          .0000000000<br />
N      #  -.8309940700   #      .0000000000<br />
N      #   .0952787255   #      .0000000000<br />
N      #   .5480609638   #      .0000000000<br />
W      -364.5920328433          .0000000000<br />
W        41.8027823088          .0000000000<br />
W       240.4573848757          .0000000000<br />
E     11876.4548199713          .0000000000     1.21<br />
V          .2872248982          .0000000000<br />
V         -.5731591248          .0000000000<br />
V          .5351449518          .0000000000<br />
N      #   .3439413070   #      .0000000000<br />
N      #  -.6863370819   #      .0000000000<br />
N      #   .6408165005   #      .0000000000<br />
W        37.4824307850          .0000000000<br />
W       -74.7964308085          .0000000000<br />
W        69.8356366100          .0000000000</p>
<p># princomp() &#8211; cov #####################################################</p>
<p>### using _correlation_ matrix<br />
princomp(mod07, cor=TRUE)</p>
<p>Call:<br />
princomp(x = mod07, cor = TRUE)</p>
<p>Standard deviations:<br />
Comp.1    Comp.2    Comp.3<br />
1.5030740 0.8397807 0.1885121</p>
<p>3  variables and  350596 observations.</p>
<p># get loadings<br />
(princomp(mod07, cor=TRUE))$loadings</p>
<p>Loadings:<br />
Comp.1 Comp.2 Comp.3<br />
MOD2007_242_500_sur_refl_b02 -0.481  0.820  0.310<br />
MOD2007_242_500_sur_refl_b06 -0.656 -0.102 -0.748<br />
MOD2007_242_500_sur_refl_b07 -0.582 -0.563  0.587</p>
<p>Comp.1 Comp.2 Comp.3<br />
SS loadings     1.000  1.000  1.000<br />
Proportion Var  0.333  0.333  0.333<br />
Cumulative Var  0.333  0.667  1.000</p>
<p># m.eigensystem &#8211; corr #################################################</p>
<p># Determine eigen values and vectors of _correlation_ matrix<br />
## Marked with &#8220;#&#8221; are the _N_&#8217;s that correspond to the eigenvectors<br />
derived from R&#8217;s princomp() ##<br />
(echo 3; r.covar -r MOD07_b02,MOD07_b06,MOD07_b07)|m.eigensystem</p>
<p>r.covar: complete &#8230;<br />
100%<br />
E         2.2915877718          .0000000000    76.39<br />
V         -.5755655569          .0000000000<br />
V         -.7660355041          .0000000000<br />
V         -.6809380186          .0000000000<br />
N     #   -.4896413269   #      .0000000000<br />
N     #   -.6516766616   #      .0000000000<br />
N     #   -.5792830912   #      .0000000000<br />
W         -.7412186091          .0000000000<br />
W         -.9865075560          .0000000000<br />
W         -.8769182329          .0000000000<br />
E          .6740687010          .0000000000    22.47<br />
V          .8667178982          .0000000000<br />
V         -.1116525720          .0000000000<br />
V         -.6069908335          .0000000000<br />
N      #   .8145815825   #      .0000000000<br />
N      #  -.1049362531   #      .0000000000<br />
N      #  -.5704780699   #      .0000000000<br />
W          .6687852213          .0000000000<br />
W         -.0861544341          .0000000000<br />
W         -.4683721194          .0000000000<br />
E          .0343435272          .0000000000     1.14<br />
V          .2486404469          .0000000000<br />
V         -.6006166822          .0000000000<br />
V          .4655120098          .0000000000<br />
N      #   .3109794470   #      .0000000000<br />
N      #  -.7512029762   #      .0000000000<br />
N      #   .5822249325   #      .0000000000<br />
W          .0576307320          .0000000000<br />
W         -.1392129859          .0000000000<br />
W          .1078979635          .0000000000</p>
<p>[1] <a href="http://www.snl.salk.edu/%7Eshlens/pub/notes/pca.pdf" rel="nofollow" target="_top">http://www.snl.salk.edu/~shlens/pub/notes/pca.pdf</a></p>
<p>[2] Copy-pasted from R Documentation</p>
<p>### prcomp() returns the following:</p>
<p>1. sdev<br />
the standard deviations of the principal components (i.e.,<br />
the square roots of the eigenvalues of the cov./cor. matrix,<br />
though the calculation is actually done with the singular<br />
values of the data matrix).</p>
<p>2. rotation<br />
the matrix of variable loadings (i.e., a matrix whose<br />
columns contain the eigenvectors). The function princomp<br />
returns this in the element loadings.</p>
<p>* Note</p>
<p>The signs of the columns of the rotation matrix are<br />
arbitrary, and so may differ between different programs<br />
for PCA, and even between different builds of R.</p>
<p>转载：http://osgeo-org.1560.n6.nabble.com/Testing-i-pca-prcomp-m-eigensystem-princomp-td4049804.html</p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=603</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using prcomp/princomp for PCA in R （二）</title>
		<link>https://www.biofacebook.com/?p=590</link>
		<comments>https://www.biofacebook.com/?p=590#comments</comments>
		<pubDate>Fri, 31 Aug 2012 04:08:34 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[My project]]></category>
		<category><![CDATA[R语言]]></category>
		<category><![CDATA[统计学习]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=590</guid>
		<description><![CDATA[<p>###############################</p> <p>PCA ############################### install.packages(&#8220;vegan&#8221;) library(vegan)</p> <p>&#62; STpcoa&#60;-read.table(file=&#8221;bactera_16s_final.subsample.phylip.tre1.weighted.phylip.pcoa.axes&#8221;, header=T,row.names=1) &#62; STpcoa axis1 axis2 axis3 axis4 Cellulose -0.020878 -0.234601 0.167454 0 Foodwaste -0.234592 0.221741 0.085802 0 Sludge 0.368882 0.100725 -0.010570 0 Xylan -0.113413 -0.087865 -0.242686 0 &#62;pl.STpcoa&#60;-princomp(STpcoa) &#62; summary(pl.STpcoa) Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Standard deviation 0.2260563 0.1746944 0.1536319 0 Proportion of Variance 0.4856521 0.2900347 [...]]]></description>
				<content:encoded><![CDATA[<p>###############################</p>
<p>PCA<br />
###############################<br />
install.packages(&#8220;vegan&#8221;)<br />
library(vegan)</p>
<p>&gt; STpcoa&lt;-read.table(file=&#8221;bactera_16s_final.subsample.phylip.tre1.weighted.phylip.pcoa.axes&#8221;, header=T,row.names=1)<br />
&gt; STpcoa<br />
axis1 axis2 axis3 axis4<br />
Cellulose -0.020878 -0.234601 0.167454 0<br />
Foodwaste -0.234592 0.221741 0.085802 0<br />
Sludge 0.368882 0.100725 -0.010570 0<br />
Xylan -0.113413 -0.087865 -0.242686 0<br />
&gt;pl.STpcoa&lt;-princomp(STpcoa)<br />
&gt; summary(pl.STpcoa)<br />
Importance of components:<br />
Comp.1 Comp.2 Comp.3 Comp.4<br />
Standard deviation 0.2260563 0.1746944 0.1536319 0<br />
Proportion of Variance 0.4856521 0.2900347 0.2243133 0<br />
Cumulative Proportion 0.4856521 0.7756867 1.0000000 1</p>
<p>&gt; ls(pl.STpcoa)<br />
[1] &#8220;call&#8221; &#8220;center&#8221; &#8220;loadings&#8221; &#8220;n.obs&#8221; &#8220;scale&#8221; &#8220;scores&#8221; &#8220;sdev&#8221;<br />
&gt; class(pl.STpcoa)<br />
[1] &#8220;princomp&#8221;<br />
&gt; nmds.col&lt;-c(rep(&#8220;green&#8221;, 1), rep(&#8220;blue&#8221;, 1), rep(&#8220;black&#8221;,1), rep(&#8220;red&#8221;,1))<br />
&gt; plot(pl.STpcoa$scores, col=nmds.col, pch=20)<br />
&gt; legend(x=0.12, y=0.25, c(&#8220;Cellulose&#8221;,&#8221;Foodwaste&#8221;,&#8221;Sludge&#8221;,&#8221;Xylan&#8221;),c(&#8220;green&#8221;,&#8221;blue&#8221;,&#8221;black&#8221;,&#8221;red&#8221;),bty=&#8221;n&#8221;)</p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/pca-comp-8-31.png"><img class="alignleft size-full wp-image-591" title="pca-comp-8-31" src="http://www.biofacebook.com/wp-content/uploads/2012/08/pca-comp-8-31.png" alt="" width="750" height="400" /></a><br />
&gt; biplot(pl.STpcoa)</p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/bioCOMP.jpeg"><img class="alignleft size-full wp-image-592" title="bioCOMP" src="http://www.biofacebook.com/wp-content/uploads/2012/08/bioCOMP.jpeg" alt="" width="750" height="400" /></a><br />
##############################<br />
&gt; pl2.STpcoa&lt;-prcomp(STpcoa)<br />
&gt; class(pl2.STpcoa)<br />
[1] &#8220;prcomp&#8221;<br />
&gt; pl2.STpcoa$sd^2<br />
[1] 0.06813525 0.04069083 0.03147035 0.00000000<br />
&gt; summary(pl2.STpcoa)<br />
Importance of components:<br />
PC1 PC2 PC3 PC4<br />
Standard deviation 0.2610 0.2017 0.1774 0<br />
Proportion of Variance 0.4857 0.2900 0.2243 0<br />
Cumulative Proportion 0.4857 0.7757 1.0000 1<br />
&gt; pl2.STpcoa$x<br />
PC1 PC2 PC3 PC4<br />
Cellulose -0.02087762 -0.2346017 0.16745306 0<br />
Foodwaste -0.23459165 0.2217407 0.08580311 0<br />
Sludge 0.36888225 0.1007250 -0.01056992 0<br />
Xylan -0.11341297 -0.0878640 -0.24268626 0<br />
&gt; pl2.STpcoa$rotation<br />
PC1 PC2 PC3 PC4<br />
axis1 1.000000e+00 -9.352971e-08 -8.835157e-07 0<br />
axis2 9.353330e-08 1.000000e+00 4.064575e-06 0<br />
axis3 8.835153e-07 -4.064576e-06 1.000000e+00 0<br />
axis4 0.000000e+00 0.000000e+00 0.000000e+00 1<br />
&gt; plot(pl2.STpcoa$x, col=nmds.col, pch=20)<br />
&gt; legend(x=0.12, y=0.25, c(&#8220;Cellulose&#8221;,&#8221;Foodwaste&#8221;,&#8221;Sludge&#8221;,&#8221;Xylan&#8221;),c(&#8220;green&#8221;,&#8221;blue&#8221;,&#8221;black&#8221;,&#8221;red&#8221;),bty=&#8221;n&#8221;)</p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/pca-pc-8-31.png"><img class="alignleft size-full wp-image-593" title="pca-pc-8-31" src="http://www.biofacebook.com/wp-content/uploads/2012/08/pca-pc-8-31.png" alt="" width="750" height="400" /></a><br />
&gt; screeplot(pl2.STpcoa,type=&#8221;lines&#8221;,main=&#8221;Scree Plot&#8221;)</p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/Scree-Plot-PC-8-31.jpeg"><img class="alignleft size-full wp-image-594" title="Scree-Plot-PC-8-31" src="http://www.biofacebook.com/wp-content/uploads/2012/08/Scree-Plot-PC-8-31.jpeg" alt="" width="750" height="400" /></a><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/bactera_pcoa.axes_.txt">bactera_16s_final.subsample.phylip.tre1.weighted.phylip.pcoa.axes</a></p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=590</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using prcomp/princomp for PCA in R （一）</title>
		<link>https://www.biofacebook.com/?p=587</link>
		<comments>https://www.biofacebook.com/?p=587#comments</comments>
		<pubDate>Fri, 31 Aug 2012 04:00:07 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[My project]]></category>
		<category><![CDATA[R语言]]></category>
		<category><![CDATA[统计学习]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=587</guid>
		<description><![CDATA[<p>Difference between prcomp and princomp:</p> <p>&#8216;princomp&#8217; can only be used with more units than variables&#8221;</p> <p>prcomp是基于SVD分解（svd()函数，princomp是基于特征向量eigen()函数)</p> <p>Good video source:</p> <p>http://www.youtube.com/watch?v=oZ2nfIPdvjY</p> <p>http://www.youtube.com/watch?v=I5GxNzKLIoU&#038;feature=relmfu</p> <p>http://www.planta.cn/forum/viewtopic.php?t=16754&#038;highlight=%D3%EF%D1%D4</p> <p>###########################################</p> <p>以下所有代码包括练习数据，都可在R平台上直接运行。</p> <p>#主成分分析和主成分回归 主成分分析的思想是Pearson 1901年提出的，Hotelling 1933进一步发展 在R中，进行主成分分析用到princomp() 函数</p> <p>用法 princomp(x, cor = FALSE, scores = TRUE, covmat = NULL, subset = rep(TRUE, nrow(as.matrix(x))), &#8230;)</p> <p># 分析用数据 # cor 是否用样本的协方差矩阵作主成分分析 prcomp() 二 summary()函数 三 [...]]]></description>
				<content:encoded><![CDATA[<p>Difference between prcomp and princomp:</p>
<p>&#8216;princomp&#8217; can only be used with more units than variables&#8221;</p>
<p>prcomp是基于SVD分解（svd()函数，princomp是基于特征向量eigen()函数)</p>
<p>Good video source:</p>
<p>http://www.youtube.com/watch?v=oZ2nfIPdvjY</p>
<p>http://www.youtube.com/watch?v=I5GxNzKLIoU&#038;feature=relmfu</p>
<p>http://www.planta.cn/forum/viewtopic.php?t=16754&#038;highlight=%D3%EF%D1%D4</p>
<p>###########################################</p>
<p>以下所有代码包括练习数据，都可在R平台上直接运行。</p>
<p>#主成分分析和主成分回归<br />
主成分分析的思想是Pearson 1901年提出的，Hotelling 1933进一步发展<br />
在R中，进行主成分分析用到princomp() 函数</p>
<p>用法<br />
princomp(x, cor = FALSE, scores = TRUE, covmat = NULL,<br />
subset = rep(TRUE, nrow(as.matrix(x))), &#8230;)</p>
<p># 分析用数据<br />
# cor 是否用样本的协方差矩阵作主成分分析<br />
prcomp()<br />
二 summary()函数<br />
三 loadings()函数<br />
四 predict() 函数<br />
五 screeplot() 函数<br />
六 biplot() 函数</p>
<p>实例<br />
某中学随机抽取某年级30名学生，测量其身高，体重，胸围，坐高，针对这30名中学生身体四项指标数据做主成分分析。<br />
student&lt;-data.frame(<br />
X1=c(148, 139, 160, 149, 159, 142, 153, 150, 151, 139,<br />
140, 161, 158, 140, 137, 152, 149, 145, 160, 156,<br />
151, 147, 157, 147, 157, 151, 144, 141, 139, 148 ),<br />
X2=c(41, 34, 49, 36, 45, 31, 43, 43, 42, 31,<br />
29, 47, 49, 33, 31, 35, 47, 35, 47, 44,<br />
42, 38, 39, 30, 48, 36, 36, 30, 32, 38 ),<br />
X3=c(72, 71, 77, 67, 80, 66, 76, 77, 77, 68,<br />
64, 78, 78, 67, 66, 73, 82, 70, 74, 78,<br />
73, 73, 68, 65, 80, 74, 68, 67, 68, 70),<br />
X4=c(78, 76, 86, 79, 86, 76, 83, 79, 80, 74,<br />
74, 84, 83, 77, 73, 79, 79, 77, 87, 85,<br />
82, 78, 80, 75, 88, 80, 76, 76, 73, 78 )<br />
)<br />
#主成分分析<br />
student.pr &lt;- princomp(student, cor = TRUE)<br />
#显示结果<br />
summary(student.pr, loadings=TRUE)<br />
#预测，显示各样本主成分的值<br />
pre&lt;-predict(student.pr)<br />
#显示碎石图<br />
screeplot(student.pr,type=&#8221;lines&#8221;)<br />
# 主成分分析散点图<br />
biplot(student.pr)</p>
<p>例二<br />
对128个成年男子的身材进行测量，每人测得16项指标，身高，坐高，胸围，头高，裤长，下档，手长，领围，前胸，后背，肩厚，肩宽，袖长，肋围，腰围，腿肚，分别用X1-X16表示。16项指标的相关矩阵R。从相关矩阵出发进行主成分分析，随16项指标进行分类。<br />
命令<br />
x&lt;-c(<br />
1.00,<br />
0.79, 1.00,<br />
0.36, 0.31, 1.00,<br />
0.96, 0.74, 0.38, 1.00,<br />
0.89, 0.58, 0.31, 0.90, 1.00,<br />
0.79, 0.58, 0.30, 0.78, 0.79, 1.00,<br />
0.76, 0.55, 0.35, 0.75, 0.74, 0.73, 1.00,<br />
0.26, 0.19, 0.58, 0.25, 0.25, 0.18, 0.24, 1.00,<br />
0.21, 0.07, 0.28, 0.20, 0.18, 0.18, 0.29,-0.04, 1.00,<br />
0.26, 0.16, 0.33, 0.22, 0.23, 0.23, 0.25, 0.49,-0.34, 1.00,<br />
0.07, 0.21, 0.38, 0.08,-0.02, 0.00, 0.10, 0.44,-0.16, 0.23, 1.00,<br />
0.52, 0.41, 0.35, 0.53, 0.48, 0.38, 0.44, 0.30,-0.05, 0.50, 0.24, 1.00,<br />
0.77, 0.47, 0.41, 0.79, 0.79, 0.69, 0.67, 0.32, 0.23, 0.31, 0.10, 0.62, 1.00,<br />
0.25, 0.17, 0.64, 0.27, 0.27, 0.14, 0.16, 0.51, 0.21, 0.15, 0.31, 0.17, 0.26, 1.00,<br />
0.51, 0.35, 0.58, 0.57, 0.51, 0.26, 0.38, 0.51, 0.15, 0.29, 0.28, 0.41, 0.50, 0.63, 1.00,<br />
0.21, 0.16, 0.51, 0.26, 0.23, 0.00, 0.12, 0.38, 0.18, 0.14, 0.31, 0.18, 0.24, 0.50, 0.65, 1.00<br />
)<br />
names&lt;-c(&#8220;X1&#8243;, &#8220;X2&#8243;, &#8220;X3&#8243;, &#8220;X4&#8243;, &#8220;X5&#8243;, &#8220;X6&#8243;, &#8220;X7&#8243;, &#8220;X8&#8243;, &#8220;X9&#8243;,<br />
&#8220;X10&#8243;, &#8220;X11&#8243;, &#8220;X12&#8243;, &#8220;X13&#8243;, &#8220;X14&#8243;, &#8220;X15&#8243;, &#8220;X16&#8243;)<br />
R&lt;-matrix(0, nrow=16, ncol=16, dimnames=list(names, names))<br />
for (i in 1:16){<br />
for (j in 1:i){<br />
R&lt;-x[(i-1)*i/2+j]; R[j,i]&lt;-R<br />
}<br />
}<br />
#主成分分析<br />
pr&lt;-princomp(covmat=R)<br />
load&lt;-loadings(pr)</p>
<p>#<br />
plot(load[,1:2])<br />
text(load[,1], load[,2], adj=c(-0.4, 0.3))</p>
<p>主成分回归<br />
考虑进口总额Y与三个自变量：国内总产值，存储量，总消费量之间的关系。现收集了1949-1959共11年的数据，试做线性回归和主成分回归分析。<br />
conomy&lt;-data.frame(<br />
x1=c(149.3, 161.2, 171.5, 175.5, 180.8, 190.7, 202.1, 212.4, 226.1, 231.9, 239.0),<br />
x2=c(4.2, 4.1, 3.1, 3.1, 1.1, 2.2, 2.1, 5.6, 5.0, 5.1, 0.7),<br />
x3=c(108.1, 114.8, 123.2, 126.9, 132.1, 137.7, 146.0, 154.1, 162.3, 164.3, 167.6),<br />
y=c(15.9, 16.4, 19.0, 19.1, 18.8, 20.4, 22.7, 26.5, 28.1, 27.6, 26.3)<br />
)</p>
<p>线性回归<br />
lm.sol&lt;-lm(y~x1+x2+x3, data=conomy)<br />
summary(lm.sol)<br />
主成分回归</p>
<p># 主成分分析<br />
conomy.pr&lt;-princomp(~x1+x2+x3, data=conomy, cor=T)<br />
summary(conomy.pr, loadings=TRUE)<br />
pre&lt;-predict(conomy.pr)<br />
conomy$z1&lt;-pre[,1]; conomy$z2&lt;-pre[,2]<br />
lm.sol&lt;-lm(y~z1+z2, data=conomy)<br />
summary(lm.sol)</p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/pca_135.jpeg"><img class="alignleft size-full wp-image-588" title="pca_135" src="http://www.biofacebook.com/wp-content/uploads/2012/08/pca_135.jpeg" alt="" width="558" height="557" /></a></p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=587</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>4sample CA RDA analysis</title>
		<link>https://www.biofacebook.com/?p=559</link>
		<comments>https://www.biofacebook.com/?p=559#comments</comments>
		<pubDate>Thu, 30 Aug 2012 04:14:47 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[My project]]></category>
		<category><![CDATA[R语言]]></category>
		<category><![CDATA[统计学习]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=559</guid>
		<description><![CDATA[<p>&#160;</p> <p>&#62; gtsdata_test=read.table(&#8220;gtsdata.txt&#8221;, header=T) &#62; gtsenv=read.table(&#8220;gtsenv.txt&#8221;, header=T) &#62; gtsdata_data_t&#60;-t(gtsdata_data) &#62; decorana(gtsdata_data_t)</p> <p>Call: decorana(veg = gtsdata_data_t)</p> <p>Detrended correspondence analysis with 26 segments. Rescaling of axes with 4 iterations.</p> <p>DCA1 DCA2 DCA3 DCA4 Eigenvalues 0.8634 0.4834 0.23788 0 Decorana values 0.8721 0.3793 0.07223 0 Axis lengths 5.3292 2.1115 1.80907 0</p> <p>&#62; gts.ca=cca(gtsdata_data_t) &#62; gts.ca Call: cca(X = [...]]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p>&gt; gtsdata_test=read.table(&#8220;gtsdata.txt&#8221;, header=T)<br />
&gt; gtsenv=read.table(&#8220;gtsenv.txt&#8221;, header=T)<br />
&gt; gtsdata_data_t&lt;-t(gtsdata_data)<br />
&gt; decorana(gtsdata_data_t)</p>
<p>Call:<br />
decorana(veg = gtsdata_data_t)</p>
<p>Detrended correspondence analysis with 26 segments.<br />
Rescaling of axes with 4 iterations.</p>
<p>DCA1 DCA2 DCA3 DCA4<br />
Eigenvalues 0.8634 0.4834 0.23788 0<br />
Decorana values 0.8721 0.3793 0.07223 0<br />
Axis lengths 5.3292 2.1115 1.80907 0</p>
<p>&gt; gts.ca=cca(gtsdata_data_t)<br />
&gt; gts.ca<br />
Call: cca(X = gtsdata_data_t)</p>
<p>Inertia Rank<br />
Total 1.653<br />
Unconstrained 1.653 3<br />
Inertia is mean squared contingency coefficient</p>
<p>Eigenvalues for unconstrained axes:<br />
CA1 CA2 CA3<br />
0.8721 0.5037 0.2776</p>
<p>&gt; plot(gts.ca,scaling=3)</p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/4sampleCA-analysis.png"><img class="alignleft size-full wp-image-560" title="4sampleCA-analysis" src="http://www.biofacebook.com/wp-content/uploads/2012/08/4sampleCA-analysis.png" alt="" width="750" height="400" /></a></p>
<p>&gt; gtsdata_data_t_del&lt;-gtsdata_data_t[1:3,]<br />
&gt; gtsdata_data_t_del</p>
<p>&gt; gts.rda=rda(gtsdata_data_t_del,gtsenv)<br />
&gt; gts.rda<br />
Call: rda(X = gtsdata_data_t_del, Y = gtsenv)</p>
<p>Inertia Proportion Rank<br />
Total 101790 1<br />
Constrained 101790 1 2<br />
Unconstrained 0 0 0<br />
Inertia is variance<br />
Some constraints were aliased because they were collinear (redundant)</p>
<p>Eigenvalues for constrained axes:<br />
RDA1 RDA2<br />
81240 20549</p>
<p>plot(gts.rda,display=c(&#8220;sp&#8221;,&#8221;bp&#8221;,&#8221;si&#8221;),scaling=3)</p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/3sampleRDA-analysis.png"><img class="alignleft size-full wp-image-561" title="3sampleRDA-analysis" src="http://www.biofacebook.com/wp-content/uploads/2012/08/3sampleRDA-analysis.png" alt="" width="750" height="400" /></a></p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/gtsenv2.txt">gtsenv.txt</a></p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/4sample_R_cluster.csv">gtsdata.txt</a></p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=559</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R 中字符矩阵转化为数值矩阵</title>
		<link>https://www.biofacebook.com/?p=554</link>
		<comments>https://www.biofacebook.com/?p=554#comments</comments>
		<pubDate>Thu, 30 Aug 2012 01:34:48 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[R语言]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=554</guid>
		<description><![CDATA[<p>a.str &#60;- matrix(c(&#8216;1&#8242;,&#8217;2&#8242;,&#8217;3&#8242;,&#8217;5&#8242;,NA,&#8217;6&#8242;) + ,c(2,3),dimnames = list(c(&#8216;g1&#8242;,&#8217;g2&#8242;),c(&#8216;t1&#8242;,&#8217;t2&#8242;,&#8217;t3&#8242;)))</p> <p>a.str # t1 t2 t3 # g1 &#8220;1&#8221; &#8220;3&#8221; NA # g2 &#8220;2&#8221; &#8220;5&#8221; &#8220;6&#8221;</p> <p>a.num &#60;- apply(a.str, c(1,2), as.numeric)</p> <p>a.num # t1 t2 t3 # g1 1 3 NA # g2 2 5 6</p> <p>Note: 第一行，第一列位置要为空！！</p> ]]></description>
				<content:encoded><![CDATA[<p>a.str &lt;- matrix(c(&#8216;1&#8242;,&#8217;2&#8242;,&#8217;3&#8242;,&#8217;5&#8242;,NA,&#8217;6&#8242;)<br />
+ ,c(2,3),dimnames = list(c(&#8216;g1&#8242;,&#8217;g2&#8242;),c(&#8216;t1&#8242;,&#8217;t2&#8242;,&#8217;t3&#8242;)))</p>
<p>a.str<br />
#      t1  t2  t3<br />
# g1 &#8220;1&#8221; &#8220;3&#8221; NA<br />
# g2 &#8220;2&#8221; &#8220;5&#8221; &#8220;6&#8221;</p>
<p>a.num &lt;- apply(a.str, c(1,2), as.numeric)</p>
<p>a.num<br />
#    t1 t2 t3<br />
# g1  1  3 NA<br />
# g2  2  5  6</p>
<p>Note: 第一行，第一列位置要为空！！</p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=554</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>基于Vegan 软件包的生态学数据排序分析学习</title>
		<link>https://www.biofacebook.com/?p=531</link>
		<comments>https://www.biofacebook.com/?p=531#comments</comments>
		<pubDate>Tue, 28 Aug 2012 04:40:20 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[My project]]></category>
		<category><![CDATA[R语言]]></category>
		<category><![CDATA[统计学习]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=531</guid>
		<description><![CDATA[<p>“基于Vegan 软件包的生态学数据排序分析 赖江山 米湘成 (中国科学院植物研究所植被与环境变化国家重点实验室，北京 100093) 摘要：群落学数据一般是多维数据，例如物种属性或环境因子的属性。多元统计分析是群落生态学常用的分析方法，排序（ordination）是多元统计最常用的方法之一。CANOCO是广泛使用的排序软件，但缺点是商业软件价格不菲，版本更新速度也很慢。近年来，R语言以其灵活、开放、易于掌握、免费等诸多优点，在生态学和生物多样性研究领域迅速赢得广大研究人员的青睐。R语言中的外在软件包“Vegan”是专门用于群落生态学分析的工具。Vegan能够提供所有基本的排序方法，同时具有生成精美排序图的功能，版本更新很快。我们认为Vegan包完全可以取代CANOCO，成为今后排序分析的首选统计工具。本文首先简述排序的原理和类型，然后介绍Vegan的基本信息和下载安装过程，最后以古田山24公顷样地内随机抽取40个20m×20m的样方为例，展示Vegan包内各种常用排序方法（PCA,RDA,CA和CCA）和排序图生成过程，希望能为R的初学者尽快熟悉并利用Vegan包进行排序分析提供参考。</p> <p>gtsdata</p> <p>gtsenv.txt</p> <p>赖江山.pdf</p> &#62; setwd("/winxp_disk2/shenzy/R/Vegan") &#62; gtsdata=read.table("gtsdata.txt", header=T) &#62; gtsenv=read.table("gtsenv.txt", header=T) &#62; install.packages("vegan") Installing package(s) into ‘/home/shenzy/R/x86_64-pc-linux-gnu-library/2.15’ (as ‘lib’ is unspecified) 试开URL’http://cran.csiro.au/src/contrib/vegan_2.0-4.tar.gz' Content type 'application/x-gzip' length 1576584 bytes (1.5 Mb) 打开了URL ================================================== downloaded 1.5 Mb * installing *source* package ‘vegan’ ... ** 成功将‘vegan’程序包解包并MD5和检查 ** libs gfortran -fpic -O3 [...]]]></description>
				<content:encoded><![CDATA[<p>“基于Vegan 软件包的生态学数据排序分析<br />
赖江山 米湘成<br />
(中国科学院植物研究所植被与环境变化国家重点实验室，北京 100093)<br />
摘要：群落学数据一般是多维数据，例如物种属性或环境因子的属性。多元统计分析是群落生态学常用的分析方法，排序（ordination）是多元统计最常用的方法之一。CANOCO是广泛使用的排序软件，但缺点是商业软件价格不菲，版本更新速度也很慢。近年来，R语言以其灵活、开放、易于掌握、免费等诸多优点，在生态学和生物多样性研究领域迅速赢得广大研究人员的青睐。R语言中的外在软件包“Vegan”是专门用于群落生态学分析的工具。Vegan能够提供所有基本的排序方法，同时具有生成精美排序图的功能，版本更新很快。我们认为Vegan包完全可以取代CANOCO，成为今后排序分析的首选统计工具。本文首先简述排序的原理和类型，然后介绍Vegan的基本信息和下载安装过程，最后以古田山24公顷样地内随机抽取40个20m×20m的样方为例，展示Vegan包内各种常用排序方法（PCA,RDA,CA和CCA）和排序图生成过程，希望能为R的初学者尽快熟悉并利用Vegan包进行排序分析提供参考。</p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/gtsdata.txt">gtsdata</a></p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/gtsenv1.txt">gtsenv.txt</a></p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/赖江山.pdf">赖江山.pdf</a></p>
<pre>&gt; setwd("/winxp_disk2/shenzy/R/Vegan")
&gt; gtsdata=read.table("gtsdata.txt", header=T)
&gt; gtsenv=read.table("gtsenv.txt", header=T)
&gt; install.packages("vegan")
Installing package(s) into ‘/home/shenzy/R/x86_64-pc-linux-gnu-library/2.15’
(as ‘lib’ is unspecified)
试开URL’http://cran.csiro.au/src/contrib/vegan_2.0-4.tar.gz'
Content type 'application/x-gzip' length 1576584 bytes (1.5 Mb)
打开了URL
==================================================
downloaded 1.5 Mb

* installing *source* package ‘vegan’ ...
** 成功将‘vegan’程序包解包并MD5和检查
** libs
gfortran   -fpic  -O3 -pipe  -g  -c cepin.f -o cepin.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c data2hill.c -o data2hill.o
gfortran   -fpic  -O3 -pipe  -g  -c decorana.f -o decorana.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c goffactor.c -o goffactor.o
gfortran   -fpic  -O3 -pipe  -g  -c monoMDS.f -o monoMDS.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c nestedness.c -o nestedness.o
gfortran   -fpic  -O3 -pipe  -g  -c ordering.f -o ordering.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c pnpoly.c -o pnpoly.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c stepacross.c -o stepacross.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c vegdist.c -o vegdist.o
gcc -std=gnu99 -shared -o vegan.so cepin.o data2hill.o decorana.o goffactor.o monoMDS.o nestedness.o ordering.o pnpoly.o stepacross.o vegdist.o -lgfortran -lm -lquadmath -L/usr/lib/R/lib -lR
安装至 /home/shenzy/R/x86_64-pc-linux-gnu-library/2.15/vegan/libs
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
   ‘decision-vegan.Rnw’
   ‘diversity-vegan.Rnw’ using ‘UTF-8’
   ‘intro-vegan.Rnw’ using ‘UTF-8’
** testing if installed package can be loaded

* DONE (vegan)

The downloaded source packages are in
	‘/tmp/RtmpmtXtEK/downloaded_packages’
&gt; library(vegan)
载入需要的程辑包：permute

载入程辑包：‘permute’

The following object(s) are masked from ‘package:gtools’:

    permute

This is vegan 2.0-4
&gt; decorana(gtsdata)

Call:
decorana(veg = gtsdata) 

Detrended correspondence analysis with 26 segments.
Rescaling of axes with 4 iterations.

                  DCA1   DCA2    DCA3    DCA4
Eigenvalues     0.3939 0.2239 0.09555 0.06226
Decorana values 0.5025 0.1756 0.06712 0.03877
Axis lengths    3.2595 2.5130 1.21445 1.00854

&gt; gts.pca=rda(gtsdata)
&gt; gts.pca
Call: rda(X = gtsdata)

              Inertia Rank
Total           352.1
Unconstrained   352.1   22
Inertia is variance 

Eigenvalues for unconstrained axes:
    PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8
111.779  73.580  54.607  32.959  26.481  18.063  12.763   7.637
(Showed only 8 of all 22 unconstrained eigenvalues)

Note: 通过以上命令选择排序模型（线性模型PCA、RDA或单峰模型CA、CCA），因为Axis lengths 等同于CANOCO中的DCA分析，
DCA排序数值最大max&gt;4选单峰，&lt;3 选线性模型， 3&lt;max&lt;4 则都二者都行，这里3.2595属于此情况！总的特征根为352.1，即物种分布
的总变化量。PCA排序为每个抽所能解释的方差变化量。对于第一抽而言，111.779/352.1=31.7% 为其物种分布的解释量。</pre>
<pre>&gt; plot(gts.pca)
<a href="http://www.biofacebook.com/wp-content/uploads/2012/08/Rplot0.png"><img class="alignleft size-full wp-image-538" title="Rplot0" src="http://www.biofacebook.com/wp-content/uploads/2012/08/Rplot0.png" alt="" width="750" height="400" /></a></pre>
<p>&nbsp;</p>
<pre> &gt;biplot(gts.pca)
<a href="http://www.biofacebook.com/wp-content/uploads/2012/08/Rplot1.png"><img class="alignleft size-full wp-image-539" title="Rplot1" src="http://www.biofacebook.com/wp-content/uploads/2012/08/Rplot1.png" alt="" width="750" height="400" /></a></pre>
<p>因以上重叠现象严重，原因是物种分布差异打，分布不均匀的物种占据了大部分排序空间，可对物种数据进行单位方差标准化。通过scale参数实现，如下：</p>
<pre>&gt; gts.pca=rda(gtsdata, scale=T)
&gt; biplot(gts.pca,scaling=3)
<a href="http://www.biofacebook.com/wp-content/uploads/2012/08/Rplot2.png"><img class="alignleft size-full wp-image-542" title="Rplot2" src="http://www.biofacebook.com/wp-content/uploads/2012/08/Rplot2.png" alt="" width="750" height="400" /></a></pre>
<pre>Note:scaling=1 关注物种间关系</pre>
<pre>scaling=2 关注样方之间关系
scaling=3 关注样方与物种之间关系</pre>
<pre>&gt; biplot(gts.pca,display="sp")</pre>
<pre>&gt; biplot(gts.pca,display="si")</pre>
<pre>&gt; biplot(gts.pca,display="sp", choices=c(1,3))
<a href="http://www.biofacebook.com/wp-content/uploads/2012/08/Rplot3.png"><img class="alignleft size-full wp-image-544" title="Rplot3" src="http://www.biofacebook.com/wp-content/uploads/2012/08/Rplot3.png" alt="" width="750" height="400" /></a></pre>
<pre></pre>
<p>CA分析：</p>
<pre>&gt; gts.ca=cca(gtsdata)
&gt; gts.ca
Call: cca(X = gtsdata)

              Inertia Rank
Total           1.424
Unconstrained   1.424   21
Inertia is mean squared contingency coefficient 

Eigenvalues for unconstrained axes:
    CA1     CA2     CA3     CA4     CA5     CA6     CA7     CA8
0.50253 0.26564 0.14023 0.10502 0.09127 0.05540 0.05063 0.04204
(Showed only 8 of all 21 unconstrained eigenvalues)</pre>
<pre>&gt; plot(gts.ca,scaling=3)
<a href="http://www.biofacebook.com/wp-content/uploads/2012/08/Rplot4.png"><img class="alignleft size-full wp-image-546" title="Rplot4" src="http://www.biofacebook.com/wp-content/uploads/2012/08/Rplot4.png" alt="" width="750" height="400" /></a></pre>
<pre>从CA解读即：如某一个物种靠近某个样方，表明该物种可能对样方位置起很大作用。从图可以看出20号样方与短柄饱（QUESER）很近。同时19与20号样方距离近，表明物种组结构特征也近！而只有少数样方出现的物种，如CASCAR，通常在排序空间边缘，表明只偶然发生。该列对应样方数值都很小或0！对在排序中心的物种，可能在取样区域是其最优分布。对应该列（CASERY）数值较大而多！</pre>
<pre>
RDA分析（多个矩阵分析）：</pre>
<pre>&gt; gts.rda=rda(gtsdata,gtsenv)
&gt; gts.rda
Call: rda(X = gtsdata, Y = gtsenv)

               Inertia Proportion Rank
Total         352.0917     1.0000
Constrained   137.4026     0.3902    8
Unconstrained 214.6891     0.6098   22
Inertia is variance 

Eigenvalues for constrained axes:
   RDA1    RDA2    RDA3    RDA4    RDA5    RDA6    RDA7    RDA8
56.3864 42.7769 17.8270 13.5066  2.5020  2.1217  1.6616  0.6203 

Eigenvalues for unconstrained axes:
   PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8
72.287 54.891 26.618 17.959 12.730  9.918  5.659  5.349
(Showed only 8 of all 22 unconstrained eigenvalues)</pre>
<pre>plot(gts.rda,display=c("sp","bp","si"),scaling=3)
<a href="http://www.biofacebook.com/wp-content/uploads/2012/08/Rplot6.png"><img class="alignleft size-full wp-image-548" title="Rplot6" src="http://www.biofacebook.com/wp-content/uploads/2012/08/Rplot6.png" alt="" width="750" height="400" /></a></pre>
<pre>在RDA排序中，箭头连线长度代表某个</pre>
<pre>环境因子与群落分布和种类分布间相关
程度的大小，越长相关性越大。
箭头连线和排序抽的夹角代表某个环境因子
与排序抽的相关性大小，越小相关性越大！</pre>
<pre>&gt; gts.prda=rda(gtsdata,gtsenv[,1:4], gtsenv[,5:8])
&gt; gts.prda
Call: rda(X = gtsdata, Y = gtsenv[, 1:4], Z = gtsenv[, 5:8])

               Inertia Proportion Rank
Total         352.0917     1.0000
Conditional    95.0318     0.2699    4
Constrained    42.3708     0.1203    4
Unconstrained 214.6891     0.6098   22
Inertia is variance 

Eigenvalues for constrained axes:
  RDA1   RDA2   RDA3   RDA4
27.522  9.087  3.442  2.320 

Eigenvalues for unconstrained axes:
   PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8
72.287 54.891 26.618 17.959 12.730  9.918  5.659  5.349
(Showed only 8 of all 22 unconstrained eigenvalues)
Note: gtsenv[,1:4]表示环境矩阵只取前4列，即地形因子。Constrained为42.37除以
352.09=12.03%，表示地形因子单独所能解释的特征根占总特征根的百分比。Y，Z调换下，
可得土壤因子单独的解释量，2者总共的解释量前面已经算出，即为39.02%。所以2组环境
变量共同的解释量为39.02%-15.53%-12.03%=11.46%!

CCA分析类似
&gt;gts.cca=cca(gtsdata,gtsenv)</pre>
<pre></pre>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=531</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R OTU heatmap2</title>
		<link>https://www.biofacebook.com/?p=521</link>
		<comments>https://www.biofacebook.com/?p=521#comments</comments>
		<pubDate>Thu, 23 Aug 2012 07:24:35 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[My project]]></category>
		<category><![CDATA[R语言]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=521</guid>
		<description><![CDATA[<p>source(&#8220;http://www.bioconductor.org/biocLite.R&#8221;); biocLite(&#8220;affy&#8221;); biocLite(&#8220;Biobase&#8221;); library(affy); library(Biobase);</p> <p>&#62;bac_4sampledata=read.csv(&#8220;/home/R_heatmap/4sample_R_cluster.csv&#8221;, sep=&#8221;\t&#8221;) &#62; row.names(bac_4sampledata)&#60;-bac_4sampledata$Group &#62; bac_4sample_Datamatrix&#60;-data.matrix(bac_4sampledata[,2:5]) &#62; heatmap.2(bac_4sample_Datamatrix, distfun=dist,col=greenred(256), scale=&#8221;row&#8221;, key=TRUE, symkey=FALSE, density.info=&#8221;none&#8221;, trace=&#8221;none&#8221;, cexRow=0.5, cexCol=0.7,margin=c(7,30), keysize=1.5);</p> <p>4sample_R_cluster_stdtop100</p> &#62; heatmap.2(bac_4sample_Datamatrix, distfun = function(x) dist(x,method = 'euclidean'),hclustfun = function(x) hclust(x,method = 'centroid'),col=greenred(256), scale="row", key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5, cexCol=0.7,margin=c(7,30), keysize=1.5); [...]]]></description>
				<content:encoded><![CDATA[<p>source(&#8220;http://www.bioconductor.org/biocLite.R&#8221;);<br />
biocLite(&#8220;affy&#8221;);<br />
biocLite(&#8220;Biobase&#8221;);<br />
library(affy);<br />
library(Biobase);</p>
<p>&gt;bac_4sampledata=read.csv(&#8220;/home/R_heatmap/4sample_R_cluster.csv&#8221;, sep=&#8221;\t&#8221;)<br />
&gt; row.names(bac_4sampledata)&lt;-bac_4sampledata$Group<br />
&gt; bac_4sample_Datamatrix&lt;-data.matrix(bac_4sampledata[,2:5])<br />
&gt; heatmap.2(bac_4sample_Datamatrix, distfun=dist,col=greenred(256), scale=&#8221;row&#8221;, key=TRUE, symkey=FALSE, density.info=&#8221;none&#8221;, trace=&#8221;none&#8221;, cexRow=0.5, cexCol=0.7,margin=c(7,30), keysize=1.5);</p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/4sample_R_cluster_stdtop100.csv">4sample_R_cluster_stdtop100</a></p>
<pre><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/4STbac-sample-heatmap.png"><img class="alignleft size-full wp-image-522" title="4STbac-sample-heatmap" src="http://www.biofacebook.com/wp-content/uploads/2012/08/4STbac-sample-heatmap.png" alt="" width="750" height="500" /></a></pre>
<pre> &gt; heatmap.2(bac_4sample_Datamatrix, distfun = function(x) dist(x,method = 'euclidean'),hclustfun = function(x) hclust(x,method = 'centroid'),col=greenred(256), scale="row", key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5, cexCol=0.7,margin=c(7,30), keysize=1.5);</pre>
<pre><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/4dsampleRplot.png"><img class="alignleft size-full wp-image-526" title="4dsampleRplot" src="http://www.biofacebook.com/wp-content/uploads/2012/08/4dsampleRplot.png" alt="" width="750" height="500" /></a></pre>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=521</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Change heatmap.2 defaults  dist for calculating the distance matrix and hclust for clustering</title>
		<link>https://www.biofacebook.com/?p=512</link>
		<comments>https://www.biofacebook.com/?p=512#comments</comments>
		<pubDate>Wed, 22 Aug 2012 07:58:43 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[R语言]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=512</guid>
		<description><![CDATA[<p>Glancing at the code for heatmap.2 I&#8217;m fairly sure that the default is to use dist, and it&#8217;s default is in turn to use euclidean distances.</p> <p>The reason your attempt at passing distfun = dist(method = 'euclidean') didn&#8217;t work is thatdistfun (and hclustfun) are supposed to simply be name of functions. So if you want [...]]]></description>
				<content:encoded><![CDATA[<p>Glancing at the code for <code>heatmap.2</code> I&#8217;m fairly sure that the default is to use <code>dist</code>, and it&#8217;s default is in turn to use euclidean distances.</p>
<p>The reason your attempt at passing <code>distfun = dist(method = 'euclidean')</code> didn&#8217;t work is that<code>distfun</code> (and <code>hclustfun</code>) are supposed to simply be <em>name</em> of functions. So if you want to alter defaults and pass arguments you need to write a wrapper function like this:</p>
<pre><code>heatmap.2(...,hclustfun = function(x) hclust(x,method = 'centroid'),...) </code></pre>
<p>As I mentioned, I&#8217;m fairly certain that <code>heatmap.2</code> is using euclidean distances by default, but a similar solution can be used to alter the distance function used:</p>
<pre><code>heatmap.2(...,distfun = function(x) dist(x,method = 'euclidean'),...)</code></pre>
<pre></pre>
<pre><code>library("gplots")
library("RColorBrewer")

test &lt;- matrix(c(79,38.6,30.2,10.8,22,
81,37.7,28.4,9.7,19.9,
82,36.2,26.8,9.8,20.9,
74,29.9,17.2,6.1,13.9,
81,37.4,20.5,6.7,14.6),ncol=5,byrow=TRUE)
colnames(test) &lt;- c("18:0","18:1","18:2","18:3","20:0")
rownames(test) &lt;- c("Sample 1","Sample 2","Sample 3", "Sample 4","Sample 5")
test &lt;- as.table(test)
mat=data.matrix(test)

heatmap.2(mat,
dendrogram="row",
Rowv=TRUE,
Colv=NULL,
distfun = dist,
hclustfun = hclust,
xlab = "Lipid Species",
ylab = NULL,
colsep=c(1),
sepcolor="black",
key=TRUE,
keysize=1,
trace="none",
density.info=c("none"),
margins=c(8, 12),
col=bluered
)</code></pre>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=512</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
