小生这厢有礼了(BioFaceBook Personal Blog) » R

R 列表随意组合 data.frame(x,y)

szypanther — Thu, 24 Jan 2013 05:02:13 +0000

> protein_data_B30_min<-protein_data[1:2548,10:12]
> protein_data_M30_min<-protein_data[1:2548,19:21]
> protein_data_30_min<-data.frame(protein_data_B30_min,protein_data_M30_min)
> protein_data_30_min[1:2,]
B30 B30.1 B30.2 Mgo30 Mgo30.1 Mgo30.2
1 870.5042 867.0873 0 1086.828 1481.228 2726.929
2 5167.6455 4646.3450 0 4409.320 3017.866 3216.642

R画维恩图

szypanther — Tue, 15 Jan 2013 07:25:18 +0000

> install.packages('plotrix') 
Installing package(s) into ‘/home/shenzy/R/x86_64-pc-linux-gnu-library/2.15’
(as ‘lib’ is unspecified)
试开URL’http://cran.csiro.au/src/contrib/plotrix_3.4-5.tar.gz'
Content type 'application/x-gzip' length 211113 bytes (206 Kb)
打开了URL
==================================================
downloaded 206 Kb

* installing *source* package ‘plotrix’ ...
** 成功将‘plotrix’程序包解包并MD5和检查
** R
** data
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded

* DONE (plotrix)

The downloaded source packages are in
	‘/tmp/RtmpQVXKKd/downloaded_packages’

library(plotrix)
plot(0:10,seq(0,10,length=11),type=’n’,axes=F,xlab=”,ylab=”)
draw.circle(2,5,2,col=rgb(154/255,0/255,205/255,0.6))
draw.circle(4,5,2,col=rgb(21/255,3/255,252/255,0.6)) text(1,5,labels=’10.12%’,col=’white’,font=2) text(5,5,labels=’40.38%’,col=’white’,font=2)
text(3,5,labels=’49.5%’,col=’white’,font=2)
legend(6.2,5,pch=15, xjust=0,yjust=0.5,bty=’n’,cex=1.3,col=c(rgb(154/255,0/255,205/255),rgb(74/255,2/255,233/255),rgb(21/255,3/255,252/255) ), legend=c(‘sample 1 uniq’,’sample& sample 2′,’sample 2 uniq’))
text(3.5,7.5,labels=’Venn chart for uniq_sRNAs’,font=2,cex=1.5)

Using prcomp/princomp for PCA in R （三）

szypanther — Fri, 31 Aug 2012 04:43:58 +0000

Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

1. Briefly about PCA
2. The modules/functions that implement PCA in GRASS & R
3. My claims (Entitled Comments)
4. Evidence (=the numbers derived from i.pca, prcomp, princomp,
m.eigensystem using some MODIS surface reflectance bands).

Finally all is clear _but_ one thing: the only “unknown” variable (to
me) is still the Eigenvalue provided by i.pca. I can’t nail this one:
*** how is it calculated? *** Looks like it is some _weighted_
variance… !??

The only thing I noticed with some testing (not thoroughly though) is
that the Eigenvalue (=Variance) reported by “i.pca” is ~0.58 of the
variance reported by “prcomp()” (sdev^2) !? That is:

sqrt(Eigenvalue(i.pca)) / sdev(prcomp) = 0.58xxx

Hamish, if you have the time (or anyone , could you please “translate”
the code in “grass6_dev/imagery/i.pca/main.c”, concerning the “eigval”
variable, in some pseudocode. It is really the last thing to clarify ( I
think ).

Kindest regards, Nikos

————————
# 1. PCA ### ### ### ###

*. PCA is performed via:

– method 1 (eigenvectors) : by determining the eigenvalues and
eigenvectors of a given covariance or correlation matrix.

– covariance matrix -> non-standardised
– correlation matrix -> standardised

– method 2 (SVD): based on by a singular value decomposition of the
data matrix. This is a more general solution to PCA than the
“eigenvectors”.

…The difference between the two methods is explained in [1]…
…SVD is used for numerical accuracy… (R Documentation)
…Results between the two methos can differ a bit…

…Data centering reduces the square mean error of
*approximating* the input data
…Data scaling

*. PCA returns:

– eigenvalues (=variance _or_ sdev^2)
– eigenvectors (=loadings, rotation, weighting coefficient)

——————————————-
# 2. Implementations of PCA ### ### ### ###

* R’s implementation of:

– method 1 (eigen) is the _princomp()_ function

-*applies*:
\data centering
\data scaling

-returns:
\sdev – standard deviation of principal components (that is:
the sqrt(Variance)
\loadings – eigenvectors

– method 2 (SVD) is the _prcomp()_ function

-options:
\data centering (center= TRUE | FALSE) [default is TRUE]
\data scaling (scale=TRUE | FALSE) [default is FALSE]

-returns:
\sdev (as above)
\rotation (same as loadings)

* GRASS’ implementation of:

– method 1 (eigen) is the _m.eigensystem_ module

-returns:
E -> eigenvalue -> Variance(E) -> sdev(E)^2
V -> eigenvectors associated with E
N -> normalized eigenvectors V –> This is _data centering_.
W -> N vector multiplied by the square root of the magnitude of
the eigenvalue (E), in other words: W = N * sdev(E)

– method 2 (SVD) is the _i.pca_ module

-returns:
\Eigenvalues (latest version by Hamish)
\Eigenvectors

—————————–
# 3. Comments ### ### ### ###

# 1: it is (?) meaningless to compare the numerical results
(eigenvectors) of “i.pca” with the results of “m.eigensystem” since they
implement two different algebraic solutions to PCA with different
options (data centering is NOT performed by i.pca). The different
algebraic solutions don’t give big differences. Data centering does.

# 2: “i.pca” corresponds to “prcomp( ,center=FALSE,scale=FALSE)” and
“m.eigensystem” corresponds to “princomp()”

# 3: The standard deviations (sdev) reported by prcomp(,center=TRUE,
scale=FALSE) _match_ the variances (=eigenvalues = sdev^2) reported by
“m.eigensystem”.

# 4: Obviously, “prcomp()” with center=TRUE and scaling=FALSE by default
gives (almost) same results as “princomp()”.

——————————————
# 4. Comparing PCA results ### ### ### ###

# prcomp() #############################################################

### Details for prcomp {stats} from R Documentation
The calculation is done by a singular value decomposition of the
(centered and possibly scaled) data matrix, not by using eigen on the
covariance matrix. This is generally the preferred method for numerical
accuracy.
## more details in [2]
#Default settings of prcomp(): _center=TRUE_ and _scale=FALSE_

### with center=FALSE, scale=FALSE
prcomp(mod07, center=FALSE, scale=FALSE) <<== this corresponds to i.pca

Standard deviations:
[1] 4288.3788 476.8904 114.3971

Rotation:
PC1 PC2 PC3
MOD2007_242_500_sur_refl_b02 -0.6353238 0.7124070 -0.2980602
MOD2007_242_500_sur_refl_b06 -0.6485551 -0.2826985 0.7067234
MOD2007_242_500_sur_refl_b07 -0.4192135 -0.6423066 -0.6416403

### with center=TRUE, scale=FALSE
prcomp(mod07, center=TRUE, scale=FALSE)
Standard deviations:
[1] 857.5749 436.0928 108.5085

Rotation:
PC1 PC2 PC3
MOD2007_242_500_sur_refl_b02 0.4184468 0.83881578 -0.3482677
MOD2007_242_500_sur_refl_b06 0.7249935 -0.07751872 0.6843794
MOD2007_242_500_sur_refl_b07 0.5470710 -0.53886820 -0.6405735

### with center=TRUE, scale=TRUE
prcomp(mod07, center=TRUE, scale=TRUE)
Standard deviations:
[1] 1.5030740 0.8397807 0.1885121

Rotation:
PC1 PC2 PC3
MOD2007_242_500_sur_refl_b02 0.4807060 0.8202848 -0.3099267
MOD2007_242_500_sur_refl_b06 0.6561893 -0.1020540 0.7476634
MOD2007_242_500_sur_refl_b07 0.5816677 -0.5627768 -0.5873201

### with center=FALSE, scale=TRUE
prcomp(mod07, center=FALSE, scale=TRUE)
Standard deviations:
[1] 1.71889117 0.20787871 0.04689961

Rotation:
PC1 PC2 PC3
MOD2007_242_500_sur_refl_b02 -0.5747640 0.74015136 -0.3490305
MOD2007_242_500_sur_refl_b06 -0.5812896 -0.06907305 0.8107597
MOD2007_242_500_sur_refl_b07 -0.5759763 -0.66888331 -0.4699430

# i.pca ##############################################################

# perform pca
i.pca input=MOD07_b02,MOD07_b06,MOD07_b07 output=TEST

Eigen values, (vectors), and [percent importance]:
PC1 6307563.04 ( -0.64 -0.65 -0.42 ) [ 98.71% ]
PC2 78023.63 ( -0.71 0.28 0.64 ) [ 1.22% ]
PC3 4504.60 ( -0.30 0.71 -0.64 ) [ 0.07% ]

### Comment: Comparing with prcomp()’s results it is obvious that
“i.pca” implements the SVD method _without_ centering and _without_
scaling the data prior to the analysis. ###

# princomp() – cov #####################################################

# Details for princomp {stats} from R Documentation
The calculation is done using eigen on the correlation or covariance
matrix

### using _covariance_ matrix
# prints only standard deviations
princomp(mod07)

Call:
princomp(x = mod07)

Standard deviations:
Comp.1 Comp.2 Comp.3
857.5737 436.0922 108.5083

3 variables and 350596 observations.

# get loadings (=rotation = eigenvectors)
(princomp(mod07))$loadings

Loadings:
Comp.1 Comp.2 Comp.3
MOD2007_242_500_sur_refl_b02 -0.418 0.839 0.348
MOD2007_242_500_sur_refl_b06 -0.725 -0.684
MOD2007_242_500_sur_refl_b07 -0.547 -0.539 0.641

Comp.1 Comp.2 Comp.3
SS loadings 1.000 1.000 1.000
Proportion Var 0.333 0.333 0.333
Cumulative Var 0.333 0.667 1.000

# m.eigensystem – cov #################################################

# Determine eigen values and vectors of _covariance_ matrix
## Marked with “#” are the _N_’s that correspond to the eigenvectors
derived from R’s princomp() ##
(echo 3; r.covar MOD07_b02,MOD07_b06,MOD07_b07)|m.eigensystem

r.covar: complete …
100%
E 778244.0258462029 .0000000000 79.20
V .5006581842 .0000000000
V .8256483300 .0000000000
V .6155834548 .0000000000
N # .4372107421 # .0000000000
N # .7210155161 # .0000000000
N # .5375717557 # .0000000000
W 385.6991853500 .0000000000
W 636.0664787886 .0000000000
W 474.2358050886 .0000000000
E 192494.5769628266 .0000000000 19.59
V -.8689798010 .0000000000
V .0996340298 .0000000000
V .5731134848 .0000000000
N # -.8309940700 # .0000000000
N # .0952787255 # .0000000000
N # .5480609638 # .0000000000
W -364.5920328433 .0000000000
W 41.8027823088 .0000000000
W 240.4573848757 .0000000000
E 11876.4548199713 .0000000000 1.21
V .2872248982 .0000000000
V -.5731591248 .0000000000
V .5351449518 .0000000000
N # .3439413070 # .0000000000
N # -.6863370819 # .0000000000
N # .6408165005 # .0000000000
W 37.4824307850 .0000000000
W -74.7964308085 .0000000000
W 69.8356366100 .0000000000

# princomp() – cov #####################################################

### using _correlation_ matrix
princomp(mod07, cor=TRUE)

Call:
princomp(x = mod07, cor = TRUE)

Standard deviations:
Comp.1 Comp.2 Comp.3
1.5030740 0.8397807 0.1885121

3 variables and 350596 observations.

# get loadings
(princomp(mod07, cor=TRUE))$loadings

Loadings:
Comp.1 Comp.2 Comp.3
MOD2007_242_500_sur_refl_b02 -0.481 0.820 0.310
MOD2007_242_500_sur_refl_b06 -0.656 -0.102 -0.748
MOD2007_242_500_sur_refl_b07 -0.582 -0.563 0.587

Comp.1 Comp.2 Comp.3
SS loadings 1.000 1.000 1.000
Proportion Var 0.333 0.333 0.333
Cumulative Var 0.333 0.667 1.000

# m.eigensystem – corr #################################################

# Determine eigen values and vectors of _correlation_ matrix
## Marked with “#” are the _N_’s that correspond to the eigenvectors
derived from R’s princomp() ##
(echo 3; r.covar -r MOD07_b02,MOD07_b06,MOD07_b07)|m.eigensystem

r.covar: complete …
100%
E 2.2915877718 .0000000000 76.39
V -.5755655569 .0000000000
V -.7660355041 .0000000000
V -.6809380186 .0000000000
N # -.4896413269 # .0000000000
N # -.6516766616 # .0000000000
N # -.5792830912 # .0000000000
W -.7412186091 .0000000000
W -.9865075560 .0000000000
W -.8769182329 .0000000000
E .6740687010 .0000000000 22.47
V .8667178982 .0000000000
V -.1116525720 .0000000000
V -.6069908335 .0000000000
N # .8145815825 # .0000000000
N # -.1049362531 # .0000000000
N # -.5704780699 # .0000000000
W .6687852213 .0000000000
W -.0861544341 .0000000000
W -.4683721194 .0000000000
E .0343435272 .0000000000 1.14
V .2486404469 .0000000000
V -.6006166822 .0000000000
V .4655120098 .0000000000
N # .3109794470 # .0000000000
N # -.7512029762 # .0000000000
N # .5822249325 # .0000000000
W .0576307320 .0000000000
W -.1392129859 .0000000000
W .1078979635 .0000000000

[1] http://www.snl.salk.edu/~shlens/pub/notes/pca.pdf

[2] Copy-pasted from R Documentation

### prcomp() returns the following:

1. sdev
the standard deviations of the principal components (i.e.,
the square roots of the eigenvalues of the cov./cor. matrix,
though the calculation is actually done with the singular
values of the data matrix).

2. rotation
the matrix of variable loadings (i.e., a matrix whose
columns contain the eigenvectors). The function princomp
returns this in the element loadings.

* Note

The signs of the columns of the rotation matrix are
arbitrary, and so may differ between different programs
for PCA, and even between different builds of R.

转载：http://osgeo-org.1560.n6.nabble.com/Testing-i-pca-prcomp-m-eigensystem-princomp-td4049804.html

Using prcomp/princomp for PCA in R （二）

szypanther — Fri, 31 Aug 2012 04:08:34 +0000

###############################

PCA
###############################
install.packages(“vegan”)
library(vegan)

> STpcoa<-read.table(file=”bactera_16s_final.subsample.phylip.tre1.weighted.phylip.pcoa.axes”, header=T,row.names=1)
> STpcoa
axis1 axis2 axis3 axis4
Cellulose -0.020878 -0.234601 0.167454 0
Foodwaste -0.234592 0.221741 0.085802 0
Sludge 0.368882 0.100725 -0.010570 0
Xylan -0.113413 -0.087865 -0.242686 0
>pl.STpcoa<-princomp(STpcoa)
> summary(pl.STpcoa)
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4
Standard deviation 0.2260563 0.1746944 0.1536319 0
Proportion of Variance 0.4856521 0.2900347 0.2243133 0
Cumulative Proportion 0.4856521 0.7756867 1.0000000 1

> ls(pl.STpcoa)
[1] “call” “center” “loadings” “n.obs” “scale” “scores” “sdev”
> class(pl.STpcoa)
[1] “princomp”
> nmds.col<-c(rep(“green”, 1), rep(“blue”, 1), rep(“black”,1), rep(“red”,1))
> plot(pl.STpcoa$scores, col=nmds.col, pch=20)
> legend(x=0.12, y=0.25, c(“Cellulose”,”Foodwaste”,”Sludge”,”Xylan”),c(“green”,”blue”,”black”,”red”),bty=”n”)

> biplot(pl.STpcoa)

##############################
> pl2.STpcoa<-prcomp(STpcoa)
> class(pl2.STpcoa)
[1] “prcomp”
> pl2.STpcoa$sd^2
[1] 0.06813525 0.04069083 0.03147035 0.00000000
> summary(pl2.STpcoa)
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 0.2610 0.2017 0.1774 0
Proportion of Variance 0.4857 0.2900 0.2243 0
Cumulative Proportion 0.4857 0.7757 1.0000 1
> pl2.STpcoa$x
PC1 PC2 PC3 PC4
Cellulose -0.02087762 -0.2346017 0.16745306 0
Foodwaste -0.23459165 0.2217407 0.08580311 0
Sludge 0.36888225 0.1007250 -0.01056992 0
Xylan -0.11341297 -0.0878640 -0.24268626 0
> pl2.STpcoa$rotation
PC1 PC2 PC3 PC4
axis1 1.000000e+00 -9.352971e-08 -8.835157e-07 0
axis2 9.353330e-08 1.000000e+00 4.064575e-06 0
axis3 8.835153e-07 -4.064576e-06 1.000000e+00 0
axis4 0.000000e+00 0.000000e+00 0.000000e+00 1
> plot(pl2.STpcoa$x, col=nmds.col, pch=20)
> legend(x=0.12, y=0.25, c(“Cellulose”,”Foodwaste”,”Sludge”,”Xylan”),c(“green”,”blue”,”black”,”red”),bty=”n”)

> screeplot(pl2.STpcoa,type=”lines”,main=”Scree Plot”)

bactera_16s_final.subsample.phylip.tre1.weighted.phylip.pcoa.axes

Using prcomp/princomp for PCA in R （一）

szypanther — Fri, 31 Aug 2012 04:00:07 +0000

Difference between prcomp and princomp:

‘princomp’ can only be used with more units than variables”

prcomp是基于SVD分解（svd()函数，princomp是基于特征向量eigen()函数)

Good video source:

http://www.youtube.com/watch?v=oZ2nfIPdvjY

http://www.youtube.com/watch?v=I5GxNzKLIoU&feature=relmfu

http://www.planta.cn/forum/viewtopic.php?t=16754&highlight=%D3%EF%D1%D4

###########################################

以下所有代码包括练习数据，都可在R平台上直接运行。

#主成分分析和主成分回归
主成分分析的思想是Pearson 1901年提出的，Hotelling 1933进一步发展
在R中，进行主成分分析用到princomp() 函数

用法
princomp(x, cor = FALSE, scores = TRUE, covmat = NULL,
subset = rep(TRUE, nrow(as.matrix(x))), …)

# 分析用数据
# cor 是否用样本的协方差矩阵作主成分分析
prcomp()
二 summary()函数
三 loadings()函数
四 predict() 函数
五 screeplot() 函数
六 biplot() 函数

实例
某中学随机抽取某年级30名学生，测量其身高，体重，胸围，坐高，针对这30名中学生身体四项指标数据做主成分分析。
student<-data.frame(
X1=c(148, 139, 160, 149, 159, 142, 153, 150, 151, 139,
140, 161, 158, 140, 137, 152, 149, 145, 160, 156,
151, 147, 157, 147, 157, 151, 144, 141, 139, 148 ),
X2=c(41, 34, 49, 36, 45, 31, 43, 43, 42, 31,
29, 47, 49, 33, 31, 35, 47, 35, 47, 44,
42, 38, 39, 30, 48, 36, 36, 30, 32, 38 ),
X3=c(72, 71, 77, 67, 80, 66, 76, 77, 77, 68,
64, 78, 78, 67, 66, 73, 82, 70, 74, 78,
73, 73, 68, 65, 80, 74, 68, 67, 68, 70),
X4=c(78, 76, 86, 79, 86, 76, 83, 79, 80, 74,
74, 84, 83, 77, 73, 79, 79, 77, 87, 85,
82, 78, 80, 75, 88, 80, 76, 76, 73, 78 )
)
#主成分分析
student.pr <- princomp(student, cor = TRUE)
#显示结果
summary(student.pr, loadings=TRUE)
#预测，显示各样本主成分的值
pre<-predict(student.pr)
#显示碎石图
screeplot(student.pr,type=”lines”)
# 主成分分析散点图
biplot(student.pr)

例二
对128个成年男子的身材进行测量，每人测得16项指标，身高，坐高，胸围，头高，裤长，下档，手长，领围，前胸，后背，肩厚，肩宽，袖长，肋围，腰围，腿肚，分别用X1-X16表示。16项指标的相关矩阵R。从相关矩阵出发进行主成分分析，随16项指标进行分类。
命令
x<-c(
1.00,
0.79, 1.00,
0.36, 0.31, 1.00,
0.96, 0.74, 0.38, 1.00,
0.89, 0.58, 0.31, 0.90, 1.00,
0.79, 0.58, 0.30, 0.78, 0.79, 1.00,
0.76, 0.55, 0.35, 0.75, 0.74, 0.73, 1.00,
0.26, 0.19, 0.58, 0.25, 0.25, 0.18, 0.24, 1.00,
0.21, 0.07, 0.28, 0.20, 0.18, 0.18, 0.29,-0.04, 1.00,
0.26, 0.16, 0.33, 0.22, 0.23, 0.23, 0.25, 0.49,-0.34, 1.00,
0.07, 0.21, 0.38, 0.08,-0.02, 0.00, 0.10, 0.44,-0.16, 0.23, 1.00,
0.52, 0.41, 0.35, 0.53, 0.48, 0.38, 0.44, 0.30,-0.05, 0.50, 0.24, 1.00,
0.77, 0.47, 0.41, 0.79, 0.79, 0.69, 0.67, 0.32, 0.23, 0.31, 0.10, 0.62, 1.00,
0.25, 0.17, 0.64, 0.27, 0.27, 0.14, 0.16, 0.51, 0.21, 0.15, 0.31, 0.17, 0.26, 1.00,
0.51, 0.35, 0.58, 0.57, 0.51, 0.26, 0.38, 0.51, 0.15, 0.29, 0.28, 0.41, 0.50, 0.63, 1.00,
0.21, 0.16, 0.51, 0.26, 0.23, 0.00, 0.12, 0.38, 0.18, 0.14, 0.31, 0.18, 0.24, 0.50, 0.65, 1.00
)
names<-c(“X1″, “X2″, “X3″, “X4″, “X5″, “X6″, “X7″, “X8″, “X9″,
“X10″, “X11″, “X12″, “X13″, “X14″, “X15″, “X16″)
R<-matrix(0, nrow=16, ncol=16, dimnames=list(names, names))
for (i in 1:16){
for (j in 1:i){
R<-x[(i-1)*i/2+j]; R[j,i]<-R
}
}
#主成分分析
pr<-princomp(covmat=R)
load<-loadings(pr)

#
plot(load[,1:2])
text(load[,1], load[,2], adj=c(-0.4, 0.3))

主成分回归
考虑进口总额Y与三个自变量：国内总产值，存储量，总消费量之间的关系。现收集了1949-1959共11年的数据，试做线性回归和主成分回归分析。
conomy<-data.frame(
x1=c(149.3, 161.2, 171.5, 175.5, 180.8, 190.7, 202.1, 212.4, 226.1, 231.9, 239.0),
x2=c(4.2, 4.1, 3.1, 3.1, 1.1, 2.2, 2.1, 5.6, 5.0, 5.1, 0.7),
x3=c(108.1, 114.8, 123.2, 126.9, 132.1, 137.7, 146.0, 154.1, 162.3, 164.3, 167.6),
y=c(15.9, 16.4, 19.0, 19.1, 18.8, 20.4, 22.7, 26.5, 28.1, 27.6, 26.3)
)

线性回归
lm.sol<-lm(y~x1+x2+x3, data=conomy)
summary(lm.sol)
主成分回归

# 主成分分析
conomy.pr<-princomp(~x1+x2+x3, data=conomy, cor=T)
summary(conomy.pr, loadings=TRUE)
pre<-predict(conomy.pr)
conomy$z1<-pre[,1]; conomy$z2<-pre[,2]
lm.sol<-lm(y~z1+z2, data=conomy)
summary(lm.sol)

4sample CA RDA analysis

szypanther — Thu, 30 Aug 2012 04:14:47 +0000

> gtsdata_test=read.table(“gtsdata.txt”, header=T)
> gtsenv=read.table(“gtsenv.txt”, header=T)
> gtsdata_data_t<-t(gtsdata_data)
> decorana(gtsdata_data_t)

Call:
decorana(veg = gtsdata_data_t)

Detrended correspondence analysis with 26 segments.
Rescaling of axes with 4 iterations.

DCA1 DCA2 DCA3 DCA4
Eigenvalues 0.8634 0.4834 0.23788 0
Decorana values 0.8721 0.3793 0.07223 0
Axis lengths 5.3292 2.1115 1.80907 0

> gts.ca=cca(gtsdata_data_t)
> gts.ca
Call: cca(X = gtsdata_data_t)

Inertia Rank
Total 1.653
Unconstrained 1.653 3
Inertia is mean squared contingency coefficient

Eigenvalues for unconstrained axes:
CA1 CA2 CA3
0.8721 0.5037 0.2776

> plot(gts.ca,scaling=3)

> gtsdata_data_t_del<-gtsdata_data_t[1:3,]
> gtsdata_data_t_del

> gts.rda=rda(gtsdata_data_t_del,gtsenv)
> gts.rda
Call: rda(X = gtsdata_data_t_del, Y = gtsenv)

Inertia Proportion Rank
Total 101790 1
Constrained 101790 1 2
Unconstrained 0 0 0
Inertia is variance
Some constraints were aliased because they were collinear (redundant)

Eigenvalues for constrained axes:
RDA1 RDA2
81240 20549

plot(gts.rda,display=c(“sp”,”bp”,”si”),scaling=3)

gtsenv.txt

gtsdata.txt

R 中字符矩阵转化为数值矩阵

szypanther — Thu, 30 Aug 2012 01:34:48 +0000

a.str <- matrix(c(‘1′,’2′,’3′,’5′,NA,’6′)
+ ,c(2,3),dimnames = list(c(‘g1′,’g2′),c(‘t1′,’t2′,’t3′)))

a.str
# t1 t2 t3
# g1 “1” “3” NA
# g2 “2” “5” “6”

a.num <- apply(a.str, c(1,2), as.numeric)

a.num
#    t1 t2 t3
# g1  1  3 NA
# g2  2  5  6

Note: 第一行，第一列位置要为空！！

基于Vegan 软件包的生态学数据排序分析学习

szypanther — Tue, 28 Aug 2012 04:40:20 +0000

“基于Vegan 软件包的生态学数据排序分析
赖江山米湘成
(中国科学院植物研究所植被与环境变化国家重点实验室，北京 100093)
摘要：群落学数据一般是多维数据，例如物种属性或环境因子的属性。多元统计分析是群落生态学常用的分析方法，排序（ordination）是多元统计最常用的方法之一。CANOCO是广泛使用的排序软件，但缺点是商业软件价格不菲，版本更新速度也很慢。近年来，R语言以其灵活、开放、易于掌握、免费等诸多优点，在生态学和生物多样性研究领域迅速赢得广大研究人员的青睐。R语言中的外在软件包“Vegan”是专门用于群落生态学分析的工具。Vegan能够提供所有基本的排序方法，同时具有生成精美排序图的功能，版本更新很快。我们认为Vegan包完全可以取代CANOCO，成为今后排序分析的首选统计工具。本文首先简述排序的原理和类型，然后介绍Vegan的基本信息和下载安装过程，最后以古田山24公顷样地内随机抽取40个20m×20m的样方为例，展示Vegan包内各种常用排序方法（PCA,RDA,CA和CCA）和排序图生成过程，希望能为R的初学者尽快熟悉并利用Vegan包进行排序分析提供参考。

gtsdata

gtsenv.txt

赖江山.pdf

> setwd("/winxp_disk2/shenzy/R/Vegan")
> gtsdata=read.table("gtsdata.txt", header=T)
> gtsenv=read.table("gtsenv.txt", header=T)
> install.packages("vegan")
Installing package(s) into ‘/home/shenzy/R/x86_64-pc-linux-gnu-library/2.15’
(as ‘lib’ is unspecified)
试开URL’http://cran.csiro.au/src/contrib/vegan_2.0-4.tar.gz'
Content type 'application/x-gzip' length 1576584 bytes (1.5 Mb)
打开了URL
==================================================
downloaded 1.5 Mb

* installing *source* package ‘vegan’ ...
** 成功将‘vegan’程序包解包并MD5和检查
** libs
gfortran   -fpic  -O3 -pipe  -g  -c cepin.f -o cepin.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c data2hill.c -o data2hill.o
gfortran   -fpic  -O3 -pipe  -g  -c decorana.f -o decorana.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c goffactor.c -o goffactor.o
gfortran   -fpic  -O3 -pipe  -g  -c monoMDS.f -o monoMDS.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c nestedness.c -o nestedness.o
gfortran   -fpic  -O3 -pipe  -g  -c ordering.f -o ordering.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c pnpoly.c -o pnpoly.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c stepacross.c -o stepacross.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -O3 -pipe  -g  -c vegdist.c -o vegdist.o
gcc -std=gnu99 -shared -o vegan.so cepin.o data2hill.o decorana.o goffactor.o monoMDS.o nestedness.o ordering.o pnpoly.o stepacross.o vegdist.o -lgfortran -lm -lquadmath -L/usr/lib/R/lib -lR
安装至 /home/shenzy/R/x86_64-pc-linux-gnu-library/2.15/vegan/libs
** R
** data
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
   ‘decision-vegan.Rnw’
   ‘diversity-vegan.Rnw’ using ‘UTF-8’
   ‘intro-vegan.Rnw’ using ‘UTF-8’
** testing if installed package can be loaded

* DONE (vegan)

The downloaded source packages are in
	‘/tmp/RtmpmtXtEK/downloaded_packages’
> library(vegan)
载入需要的程辑包：permute

载入程辑包：‘permute’

The following object(s) are masked from ‘package:gtools’:

    permute

This is vegan 2.0-4
> decorana(gtsdata)

Call:
decorana(veg = gtsdata) 

Detrended correspondence analysis with 26 segments.
Rescaling of axes with 4 iterations.

                  DCA1   DCA2    DCA3    DCA4
Eigenvalues     0.3939 0.2239 0.09555 0.06226
Decorana values 0.5025 0.1756 0.06712 0.03877
Axis lengths    3.2595 2.5130 1.21445 1.00854

> gts.pca=rda(gtsdata)
> gts.pca
Call: rda(X = gtsdata)

              Inertia Rank
Total           352.1
Unconstrained   352.1   22
Inertia is variance 

Eigenvalues for unconstrained axes:
    PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8
111.779  73.580  54.607  32.959  26.481  18.063  12.763   7.637
(Showed only 8 of all 22 unconstrained eigenvalues)

Note: 通过以上命令选择排序模型（线性模型PCA、RDA或单峰模型CA、CCA），因为Axis lengths 等同于CANOCO中的DCA分析，
DCA排序数值最大max>4选单峰，<3 选线性模型， 3
> plot(gts.pca)

 
 >biplot(gts.pca)

因以上重叠现象严重，原因是物种分布差异打，分布不均匀的物种占据了大部分排序空间，可对物种数据进行单位方差标准化。通过scale参数实现，如下：
> gts.pca=rda(gtsdata, scale=T)
> biplot(gts.pca,scaling=3)

Note:scaling=1 关注物种间关系
scaling=2 关注样方之间关系
scaling=3 关注样方与物种之间关系
> biplot(gts.pca,display="sp")
> biplot(gts.pca,display="si")
> biplot(gts.pca,display="sp", choices=c(1,3))


CA分析：
> gts.ca=cca(gtsdata)
> gts.ca
Call: cca(X = gtsdata)

              Inertia Rank
Total           1.424
Unconstrained   1.424   21
Inertia is mean squared contingency coefficient 

Eigenvalues for unconstrained axes:
    CA1     CA2     CA3     CA4     CA5     CA6     CA7     CA8
0.50253 0.26564 0.14023 0.10502 0.09127 0.05540 0.05063 0.04204
(Showed only 8 of all 21 unconstrained eigenvalues)
> plot(gts.ca,scaling=3)

从CA解读即：如某一个物种靠近某个样方，表明该物种可能对样方位置起很大作用。从图可以看出20号样方与短柄饱（QUESER）很近。同时19与20号样方距离近，表明物种组结构特征也近！而只有少数样方出现的物种，如CASCAR，通常在排序空间边缘，表明只偶然发生。该列对应样方数值都很小或0！对在排序中心的物种，可能在取样区域是其最优分布。对应该列（CASERY）数值较大而多！
RDA分析（多个矩阵分析）：
> gts.rda=rda(gtsdata,gtsenv)
> gts.rda
Call: rda(X = gtsdata, Y = gtsenv)

               Inertia Proportion Rank
Total         352.0917     1.0000
Constrained   137.4026     0.3902    8
Unconstrained 214.6891     0.6098   22
Inertia is variance 

Eigenvalues for constrained axes:
   RDA1    RDA2    RDA3    RDA4    RDA5    RDA6    RDA7    RDA8
56.3864 42.7769 17.8270 13.5066  2.5020  2.1217  1.6616  0.6203 

Eigenvalues for unconstrained axes:
   PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8
72.287 54.891 26.618 17.959 12.730  9.918  5.659  5.349
(Showed only 8 of all 22 unconstrained eigenvalues)
plot(gts.rda,display=c("sp","bp","si"),scaling=3)

在RDA排序中，箭头连线长度代表某个
环境因子与群落分布和种类分布间相关
程度的大小，越长相关性越大。
箭头连线和排序抽的夹角代表某个环境因子
与排序抽的相关性大小，越小相关性越大！
> gts.prda=rda(gtsdata,gtsenv[,1:4], gtsenv[,5:8])
> gts.prda
Call: rda(X = gtsdata, Y = gtsenv[, 1:4], Z = gtsenv[, 5:8])

               Inertia Proportion Rank
Total         352.0917     1.0000
Conditional    95.0318     0.2699    4
Constrained    42.3708     0.1203    4
Unconstrained 214.6891     0.6098   22
Inertia is variance 

Eigenvalues for constrained axes:
  RDA1   RDA2   RDA3   RDA4
27.522  9.087  3.442  2.320 

Eigenvalues for unconstrained axes:
   PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8
72.287 54.891 26.618 17.959 12.730  9.918  5.659  5.349
(Showed only 8 of all 22 unconstrained eigenvalues)
Note: gtsenv[,1:4]表示环境矩阵只取前4列，即地形因子。Constrained为42.37除以
352.09=12.03%，表示地形因子单独所能解释的特征根占总特征根的百分比。Y，Z调换下，
可得土壤因子单独的解释量，2者总共的解释量前面已经算出，即为39.02%。所以2组环境
变量共同的解释量为39.02%-15.53%-12.03%=11.46%!

CCA分析类似
>gts.cca=cca(gtsdata,gtsenv)

R OTU heatmap2

szypanther — Thu, 23 Aug 2012 07:24:35 +0000

source(“http://www.bioconductor.org/biocLite.R”);
biocLite(“affy”);
biocLite(“Biobase”);
library(affy);
library(Biobase);

>bac_4sampledata=read.csv(“/home/R_heatmap/4sample_R_cluster.csv”, sep=”\t”)
> row.names(bac_4sampledata)<-bac_4sampledata$Group
> bac_4sample_Datamatrix<-data.matrix(bac_4sampledata[,2:5])
> heatmap.2(bac_4sample_Datamatrix, distfun=dist,col=greenred(256), scale=”row”, key=TRUE, symkey=FALSE, density.info=”none”, trace=”none”, cexRow=0.5, cexCol=0.7,margin=c(7,30), keysize=1.5);

4sample_R_cluster_stdtop100

 > heatmap.2(bac_4sample_Datamatrix, distfun = function(x) dist(x,method = 'euclidean'),hclustfun = function(x) hclust(x,method = 'centroid'),col=greenred(256), scale="row", key=TRUE, symkey=FALSE, density.info="none", trace="none", cexRow=0.5, cexCol=0.7,margin=c(7,30), keysize=1.5);

Change heatmap.2 defaults dist for calculating the distance matrix and hclust for clustering

szypanther — Wed, 22 Aug 2012 07:58:43 +0000

Glancing at the code for heatmap.2 I’m fairly sure that the default is to use dist, and it’s default is in turn to use euclidean distances.

The reason your attempt at passing distfun = dist(method = 'euclidean') didn’t work is thatdistfun (and hclustfun) are supposed to simply be name of functions. So if you want to alter defaults and pass arguments you need to write a wrapper function like this:

heatmap.2(...,hclustfun = function(x) hclust(x,method = 'centroid'),...)

As I mentioned, I’m fairly certain that heatmap.2 is using euclidean distances by default, but a similar solution can be used to alter the distance function used:

heatmap.2(...,distfun = function(x) dist(x,method = 'euclidean'),...)

library("gplots")
library("RColorBrewer")

test <- matrix(c(79,38.6,30.2,10.8,22,
81,37.7,28.4,9.7,19.9,
82,36.2,26.8,9.8,20.9,
74,29.9,17.2,6.1,13.9,
81,37.4,20.5,6.7,14.6),ncol=5,byrow=TRUE)
colnames(test) <- c("18:0","18:1","18:2","18:3","20:0")
rownames(test) <- c("Sample 1","Sample 2","Sample 3", "Sample 4","Sample 5")
test <- as.table(test)
mat=data.matrix(test)

heatmap.2(mat,
dendrogram="row",
Rowv=TRUE,
Colv=NULL,
distfun = dist,
hclustfun = hclust,
xlab = "Lipid Species",
ylab = NULL,
colsep=c(1),
sepcolor="black",
key=TRUE,
keysize=1,
trace="none",
density.info=c("none"),
margins=c(8, 12),
col=bluered
)