Using prcomp/princomp for PCA in R (二)

###############################

PCA ############################### install.packages(“vegan”) library(vegan)

> STpcoa<-read.table(file=”bactera_16s_final.subsample.phylip.tre1.weighted.phylip.pcoa.axes”, header=T,row.names=1) > STpcoa axis1 axis2 axis3 axis4 Cellulose -0.020878 -0.234601 0.167454 0 Foodwaste -0.234592 0.221741 0.085802 0 Sludge 0.368882 0.100725 -0.010570 0 Xylan -0.113413 -0.087865 -0.242686 0 >pl.STpcoa<-princomp(STpcoa) > summary(pl.STpcoa) Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Standard deviation 0.2260563 0.1746944 0.1536319 0 Proportion of Variance 0.4856521 0.2900347 […]

Using prcomp/princomp for PCA in R (一)

Difference between prcomp and princomp:

‘princomp’ can only be used with more units than variables”

prcomp是基于SVD分解(svd()函数,princomp是基于特征向量eigen()函数)

Good video source:

http://www.youtube.com/watch?v=oZ2nfIPdvjY

http://www.youtube.com/watch?v=I5GxNzKLIoU&feature=relmfu

http://www.planta.cn/forum/viewtopic.php?t=16754&highlight=%D3%EF%D1%D4

###########################################

以下所有代码包括练习数据,都可在R平台上直接运行。

#主成分分析和主成分回归 主成分分析的思想是Pearson 1901年提出的,Hotelling 1933进一步发展 在R中,进行主成分分析用到princomp() 函数

用法 princomp(x, cor = FALSE, scores = TRUE, covmat = NULL, subset = rep(TRUE, nrow(as.matrix(x))), …)

# 分析用数据 # cor 是否用样本的协方差矩阵作主成分分析 prcomp() 二 summary()函数 三 […]

4sample CA RDA analysis

 

> gtsdata_test=read.table(“gtsdata.txt”, header=T) > gtsenv=read.table(“gtsenv.txt”, header=T) > gtsdata_data_t<-t(gtsdata_data) > decorana(gtsdata_data_t)

Call: decorana(veg = gtsdata_data_t)

Detrended correspondence analysis with 26 segments. Rescaling of axes with 4 iterations.

DCA1 DCA2 DCA3 DCA4 Eigenvalues 0.8634 0.4834 0.23788 0 Decorana values 0.8721 0.3793 0.07223 0 Axis lengths 5.3292 2.1115 1.80907 0

> gts.ca=cca(gtsdata_data_t) > gts.ca Call: cca(X = […]

基于Vegan 软件包的生态学数据排序分析学习

“基于Vegan 软件包的生态学数据排序分析 赖江山 米湘成 (中国科学院植物研究所植被与环境变化国家重点实验室,北京 100093) 摘要:群落学数据一般是多维数据,例如物种属性或环境因子的属性。多元统计分析是群落生态学常用的分析方法,排序(ordination)是多元统计最常用的方法之一。CANOCO是广泛使用的排序软件,但缺点是商业软件价格不菲,版本更新速度也很慢。近年来,R语言以其灵活、开放、易于掌握、免费等诸多优点,在生态学和生物多样性研究领域迅速赢得广大研究人员的青睐。R语言中的外在软件包“Vegan”是专门用于群落生态学分析的工具。Vegan能够提供所有基本的排序方法,同时具有生成精美排序图的功能,版本更新很快。我们认为Vegan包完全可以取代CANOCO,成为今后排序分析的首选统计工具。本文首先简述排序的原理和类型,然后介绍Vegan的基本信息和下载安装过程,最后以古田山24公顷样地内随机抽取40个20m×20m的样方为例,展示Vegan包内各种常用排序方法(PCA,RDA,CA和CCA)和排序图生成过程,希望能为R的初学者尽快熟悉并利用Vegan包进行排序分析提供参考。

gtsdata

gtsenv.txt

赖江山.pdf

> setwd(“/winxp_disk2/shenzy/R/Vegan”) > gtsdata=read.table(“gtsdata.txt”, header=T) > gtsenv=read.table(“gtsenv.txt”, header=T) > install.packages(“vegan”) Installing package(s) into ‘/home/shenzy/R/x86_64-pc-linux-gnu-library/2.15’ (as ‘lib’ is unspecified) 试开URL’http://cran.csiro.au/src/contrib/vegan_2.0-4.tar.gz’ Content type ‘application/x-gzip’ length 1576584 bytes (1.5 Mb) 打开了URL ================================================== downloaded 1.5 Mb * installing *source* package ‘vegan’ … ** 成功将‘vegan’程序包解包并MD5和检查 ** libs gfortran -fpic -O3 […]

R OTU heatmap2

source(“http://www.bioconductor.org/biocLite.R”); biocLite(“affy”); biocLite(“Biobase”); library(affy); library(Biobase);

>bac_4sampledata=read.csv(“/home/R_heatmap/4sample_R_cluster.csv”, sep=”\t”) > row.names(bac_4sampledata)<-bac_4sampledata$Group > bac_4sample_Datamatrix<-data.matrix(bac_4sampledata[,2:5]) > heatmap.2(bac_4sample_Datamatrix, distfun=dist,col=greenred(256), scale=”row”, key=TRUE, symkey=FALSE, density.info=”none”, trace=”none”, cexRow=0.5, cexCol=0.7,margin=c(7,30), keysize=1.5);

4sample_R_cluster_stdtop100

> heatmap.2(bac_4sample_Datamatrix, distfun = function(x) dist(x,method = ‘euclidean’),hclustfun = function(x) hclust(x,method = ‘centroid’),col=greenred(256), scale=”row”, key=TRUE, symkey=FALSE, density.info=”none”, trace=”none”, cexRow=0.5, cexCol=0.7,margin=c(7,30), keysize=1.5); […]

454 pyrosequencing analysis pipeline

mothur > sffinfo(sff=454Reads_archaea.sff, flow=T) Extracting info from 454Reads_archaea.sff … 10000 20000 30000 40000 50000 60000 70000 80000 90000 92115 It took 68 secs to extract 92115. Output File Names: 454Reads_archaea.fasta 454Reads_archaea.qual 454Reads_archaea.flow

mothur > trim.flows(flow=454Reads_archaea.flow, oligos=oligos_LXY.txt, pdiffs=2, bdiffs=1, processors=2) Appending files from process 15674

Output File Names: 454Reads_archaea.trim.flow 454Reads_archaea.scrap.flow 454Reads_archaea.GZ_ARC.flow 454Reads_archaea.GZ1122_ARC.flow 454Reads_archaea.GZ1122cellulose_ARC.flow 454Reads_archaea.GZ_xylan_ARC.flow 454Reads_archaea.GZ_cellulose55_ARC.flow 454Reads_archaea.SHX_xylan_ARC.flow […]

My PROJECT

Amoa project

A2,A3 amoa sample sequencing

(1) use geneious to deal with the sequence raw data, get 635bp DNA sequence

(2)use mafft to get the alignment results

(3)mothur deal pipline

mothur > dist.seqs(fasta=AMOA_A2A3.mafft.align.shortname.fasta, output=lt)

0 0 59 0

Output File Name: AMOA_A2A3.mafft.align.shortname.phylip.dist

It took 0 to calculate the distances for 60 sequences.

mothur > cluster(phylip=AMOA_A2A3.mafft.align.shortname.phylip.dist, […]