肠型分析学习

肠型,Enterotype,是2011年在这篇文章中提出的,即将过去的2018年又有20多们肠道微生物的大佬对肠型的概念进行了回顾和确认。一直比较好奇怎样来用代码分析肠型,今天找到了这个教程,放在这:

这是那篇原始的文章:Arumugam, M., Raes, J., et al. (2011) Enterotypes of the human gut microbiome, Nature,doi://10.1038/nature09944 在谷歌上一搜,作者竟然做了个分析肠型的教程在这,学习一下:http://enterotyping.embl.de/enterotypes.html 这是2018年大佬们的共识文章:这是国人翻译的这篇文章,http://blog.sciencenet.cn/blog-3334560-1096828.html 当然,如果你只需要获得自己的结果或者自己课题的结果,不需要跑代码的,有最新的网页版分型,更好用,网址也放在这,同样也是上面翻译的那篇文章里提到的网址:http://enterotypes.org/ 只需要把菌属的含量比例文件上就能很快得到结果。

下面我就边学习边做来尝试着来个分析,并把代码放在这里备忘。其实作者已经整理好了代码,我学习一下,争取实现对手上的数据进行分析。

首先下载测试数据, wget http://enterotyping.embl.de/MetaHIT_SangerSamples.genus.txt wget http://enterotyping.embl.de/enterotypes_tutorial.sanger.R 跑跑示例数据,排排错

我表示对R语言还只是一知半解的状态,所以,先跑下,然后能用上自己的数据, 当个工具用就暂知足啦。我是黑苹果10.11的系统,运行这个软件提示少了Xquartz,于是装了个,windows和linux应该不需要。原代码中还提示『没有”s.class”这个函数』,百度了一下发现有个老兄的新浪博客说了是这个包,于是加了句library(ade4)就ok了。 Xquartz的下载地址Mac 10.6+:https://dl.bintray.com/xquartz/downloads/XQuartz-2.7.11.dmg

 

#Uncomment next two lines if R packages are already installed #install.packages(“cluster”) #install.packages(“clusterSim”) library(cluster) library(clusterSim) #BiocManager::install(“genefilter”) library(ade4)#Download the example data and set the working directory […]

conda 报错 Solving environment: failed

现在说说我的解决思路: 1.根据错误内容,安装失败的原因应该是这个网址 https://repo.anaconda.com/pkgs/main/noarch/repodata.json.bz2 请求失败。 2.所以我尝试用

wget https://repo.anaconda.com/pkgs/main/noarch/repodata.json.bz2 1 手动下载这个包,结果出现以下错误。

-2018-12-12 18:29:18– https://repo.anaconda.com/pkgs/main/noarch/repodata.json.bz2 Connecting to 127.0.0.1:33473… failed: Connection refused.

1 2 3 那么应该寻找失败的原因,127.0.0.1表示的是本机,应该不会有什么问题,那么会不会是因为端口33473被占用的原因。 3.用netstat -ntpl查看本地端口的使用情况

netstat -ntpl (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) Active Internet connections (only servers) Proto […]

ssh, scp and rsync

[Admin.DESKTOP-7JT504C] ➤ rsync -P –rsh=ssh /drives/d/Kraken_12.tar.gz cityu_jhli_1@172.16.22.11:/BIGDATA1/cityu_jhli_1/mhyleung/database/findfungi Warning: Permanently added ‘172.16.22.11’ (RSA) to the list of known hosts.

rsync -P -avz -e “ssh -p5566 -i /drives/C/Users/Admin/Desktop/cityu_jhli_1.id” /drives/d/Kraken_12.tar.gz cityu_jhli_1@172.16.22.11:/BIGDATA1/cityu_jhli_1/mhyleung/database/findfungi

[Admin.DESKTOP-7JT504C] ➤ scp -P 5566 -i /drives/C/Users/Admin/Desktop/cityu_jhli_1.id -r /drives/d/Kraken_12.tar.gz cityu_jhli_1@172.16.22.11:/BIGDATA1/cityu_jhli_1/mhyleung/database/findfungi

 

ssh -p5566 -i /drives/C/Users/Admin/Desktop/cityu_jhli_1.id cityu_jhli_1@172.16.22.11

v2ray

https://github.com/Jrohy/multi-v2ray

 

Docker运行

默认创建mkcp + 随机一种伪装头配置文件:

docker run -d –name v2ray –privileged –restart always –network host jrohy/v2ray

自定义v2ray配置文件:

docker run -d –name v2ray –privileged -v /path/config.json:/etc/v2ray/config.json –restart always –network host jrohy/v2ray

查看v2ray配置:

docker exec v2ray bash -c “v2ray info”

warning: 如果用centos,需要先关闭防火墙

systemctl stop firewalld.service systemctl disable firewalld.service

Multivariate analyses in R (PERMANOVA )

https://rpubs.com/collnell/manova

Multivariate analyses in R

By C Nell

Types of questions

Do groups differ in composition? Does community structure vary among regions or over time? Do environmental variables explain community patterns? Which species are responsible for differences among groups?

Multivariate analysis of ecological communities with vegan

install.packages(‘vegan’) library(vegan) ##Community ecology: ordination, disversity & dissimilarities Dataset […]

Good software

multiqc ranger ployly

https://github.com/MultiQC

https://github.com/ranger/ranger

https://zhuanlan.zhihu.com/p/34369349

R drawing png with high resolution

可重复的示例:

the_plot <- function() { x <- seq(0, 1, length.out = 100) y <- pbeta(x, 1, 10) plot( x, y, xlab = “False Positive Rate”, ylab = “Average true positive rate”, type = “l” ) }

 

png( “test.png”, width = 3.25, height = 3.25, units = “in”, res = 1200, pointsize = 4 ) […]

tmap install

 

1199 git clone https://github.com/GPZ-Bioinfo/tmap.git 1200 cd tmap 1201 ll 1202 python setup.py install 1203 ll 1204 cd ../ 1205 rm -rf tmap 1206 deactivate 1207 rmvirtualenv tmap_ENV 1208 mkvirtualenv -p /usr/bin/python3.4m tmap_ENV

pip3 install pypiwin32

conda install scipy

sudo pip3 install matplotlib

 

Good measure to download sequence from NCBI based on acc or gi number

cat file_with_ids.txt | while read p; do echo $p; esearch -db nucleotide -query $p | efetch -format fasta > $p.fasta; done;

 

 

cat ginumber.txt| while read p; do echo $p; efetch -db nucleotide -id $p -format gb > $p.gbk; done;

 

shenzy@SZYENVS:~/work/zhongshan/virus_database$ cat ginumber.txt| while read p; do echo $p; efetch -db nucleotide -id […]

fasta2nexus by R script

Workspace loaded from ~/.RData] > setwd(“/home/shenzy/work/beast/51samples”) > library(seqinr) > data=read.fasta(“51strain_core_gene_alignment.aln”) > library(ape) Attaching package: ‘ape’ The following objects are masked from ‘package:seqinr’: as.alignment, consensus > write.nexus.data(data,file=”51strain_core_gene_alignment.aln.nexus”, format=”DNA”) > […]