小生这厢有礼了(BioFaceBook Personal Blog)

记录生物信息学点滴足迹（NGS,Genome,Meta,Linux)

ssh, scp and rsync

[Admin.DESKTOP-7JT504C] ➤ rsync -P –rsh=ssh /drives/d/Kraken_12.tar.gz cityu_jhli_1@172.16.22.11:/BIGDATA1/cityu_jhli_1/mhyleung/database/findfungi Warning: Permanently added ‘172.16.22.11’ (RSA) to the list of known hosts.

rsync -P -avz -e “ssh -p5566 -i /drives/C/Users/Admin/Desktop/cityu_jhli_1.id” /drives/d/Kraken_12.tar.gz cityu_jhli_1@172.16.22.11:/BIGDATA1/cityu_jhli_1/mhyleung/database/findfungi

[Admin.DESKTOP-7JT504C] ➤ scp -P 5566 -i /drives/C/Users/Admin/Desktop/cityu_jhli_1.id -r /drives/d/Kraken_12.tar.gz cityu_jhli_1@172.16.22.11:/BIGDATA1/cityu_jhli_1/mhyleung/database/findfungi

ssh -p5566 -i /drives/C/Users/Admin/Desktop/cityu_jhli_1.id cityu_jhli_1@172.16.22.11

v2ray

https://github.com/Jrohy/multi-v2ray

Docker运行

默认创建mkcp + 随机一种伪装头配置文件：

docker run -d --name v2ray --privileged --restart always --network host jrohy/v2ray

自定义v2ray配置文件:

docker run -d --name v2ray --privileged -v /path/config.json:/etc/v2ray/config.json --restart always --network host jrohy/v2ray

查看v2ray配置:

docker exec v2ray bash -c "v2ray info"

warning: 如果用centos，需要先关闭防火墙

systemctl stop firewalld.service
systemctl disable firewalld.service

heatmap R

> library(gplots)

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

lowess

>
> setwd(“/home/zyshen/work/QM_nanjing”)
> data2<-read.csv(“combined_example.level_5.csv”, header=T, sep=”,”)
> data2plot<-data.matrix(data2[2:3])
> row.names(data2plot)<-data2[,1]
> heatmap.2(data2plot,trace=”none”,cexCol = 2,col=greenred(50), margins = c(5, 40), sepwidth=c(0.05,0.05))

if (!require(“gplots”)) {
install.packages(“gplots”, dependencies = TRUE)
library(gplots)
}
if (!require(“RColorBrewer”)) {
install.packages(“RColorBrewer”, dependencies = TRUE)
library(RColorBrewer)
}
#########################################################
### B) Reading in data and transform it into matrix format
#########################################################
setwd(“/home/zyshen/Downloads/酵母代谢”)
data <- read.csv(“W29-knockout-0.1-SD-10X-internal.val.filter.csv”, comment.char=”#”, header=T)
rnames <- data[,1] # assign labels in column 1 to “rnames”
mat_data <- data.matrix(data[,2:ncol(data)]) # transform column 2-5 into a matrix
rownames(mat_data) <- rnames # assign row names
my_palette <- colorRampPalette(c(“red”, “yellow”, “green”))(n = 299)

# (optional) defines the color breaks manually for a “skewed” color transition
col_breaks = c(seq(-50,0,length=100), # for red
seq(0.1,20,length=100), # for yellow
seq(21,100,length=100)) # for green

heatmap.2(mat_data,
cellnote = mat_data, # same data set for cell labels
main = “Correlation”, # heat map title
notecol=”black”, # change font color of cell labels to black
density.info=”none”, # turns off density plot inside color legend
trace=”none”, # turns off trace lines inside the heat map
margins =c(8,20), # widens margins around plot
col=my_palette, # use on color palette defined earlier
breaks=col_breaks, # enable color transition at specified limits
dendrogram=”row”, # only draw a row dendrogram
Colv=”NA”)

alpha多样性

扩增子数据分析之多样性指数： alpha多样性

多样性指数（Diversity index）和计算公式可以见： wikipedia

Alpha多样性（Alpha Diversity）是对某个样品中物种多样性的分析，包含样品中的物种类别的多样性——丰富度（Richness）和物种组成多少的整体分布——均匀度（Evenness）两个因素，通常用Richness,Chao1，Shannon，Simpson，Dominance和Equitability等指数来评估样本的物种多样性。

丰富度指数

Richness, Chao1，Shannon三个指数是常用的评估丰富度的指标，数值越高表明样品包含的物种丰富度就越高。

Richness指数: 指样本中被检测到的OTU量；
Chao1指数   : 通过低丰度OTUs来进一步预测样品中的OTUs数量；
Shannon指数 : 计算考虑到样品中的OTUs及其相对丰度信息，
             通过对数（如以2为底的shannon_2，以自然对数为底的shannon_e
             以10为底的shannon_10）转换来预测样品中的分类多样性。

均匀度指数

Simpson，Dominance和Equitability三个指数是常用的评估均匀度的指标。

Simpson指数     : 表示随机选取两条序列属于同一个分类（如OTUs）的概率（故数值在0~1之间），
                  数值越接近1表示表明OTUs的丰度分部越不均匀；
Dominancez指数  : 取值为1-Simpson，表示随机选取两条序列属于不同分类（如OTUs）的概率；
Equitability指数: 根据Shannon指数值计算，当其值为1时表明样品中的物种丰度分布绝对均匀，
                  而其值越小这表明物种丰度分布呈现出越高的偏向。

汇总表：

指数	单位	计算方式
richness	OTUs	样本中至少包含一条序列的OTU数目
chao1	OTUs	N + S^2 / (2D^2)，其中N为OTU个数, S为丰度为1的OTUs个数，D为丰度为2的OTUs数目；
shannon_2	bits	sum(f), 对所有OTU频率计算p*log(p,2)和, p为OTU的频率；
shannon_e	nats	sum(f), 对所有OTU频率计算p*log(p,e)和, p为OTU的频率；
shannon_10	dits	sum(f), 对所有OTU频率计算p*log(p,10)和, p为OTU的频率；
simpson	Probability	sum(f^2)， f为所有OTU频率的和
dominance	Probability	1-simpson
equitability		shannon/log(N), N为OTU数(logs to base 2)

实例：

USEARCH alpha_div

USEARCH 提供了alpha_div函数进行计算各种指数, 可通·-metrics 指定需要计算指数，支持的指数有： berger_parker、buzas_gibson、chao1、dominance、equitability、jost、jost1、reads、richness、robbins、simpson shannon_e、shannon_2、shannon_10

usearch -alpha_div otutable.txt -output alpha.txt
usearch -alpha_div otutable.txt -output gini.txt  -metrics gini_simpson
usearch -alpha_div otutable.txt -output alpha.txt -metrics chao1,

QIIME diversity alpha

qiime2 数据分析流程通过 qiime diversity接口提供了分析`alpha多样性·的各种命令：

--i-table  : FeatureTable
--p-metric : enspie|michaelis_menten_fit|strong|lladser_pe|fisher_alpha
             |goods_coverage|doubles|simpson|margalef|observed_otus|osd
             |shannon|pielou_e|chao1|brillouin_d|menhinick|simpson_e
             |kempton_taylor_q|robbins|dominance|lladser_ci|heip_e
             |singles|chao1_ci|mcintosh_d|ace|mcintosh_e|gini_index
             |berger_parker_d|esty_ci
--o-alpha-diversity: 输出alpha多样性；
--output-dir： 输出目录（如不指定--o-distance-matrix）；

执行：

qiime diversity alpha          \
   --i-table  table.qza       \
   --p-metric  goods_coverage \
   --o-alpha-diversity  goods_coverage.qza

物多样性测定主要有三个空间尺度：α多样性，β多样性，γ多样性。

α多样性主要关注局域均匀生境下的物种数目，因此也被称为生境内的多样性（within-habitat diversity）

β多样性指沿环境梯度不同生境群落之间物种组成的的相异性或物种沿环境梯度的更替速率也被称为生境间的多样性（between-habitat diversity），控制β多样性的主要生态因子有土壤、地貌及干扰等。

不同群落或某环境梯度上不同点之间的共有种越少，β多样性越大。精确地测定β多样性具有重要的意义。这是因为：①它可以指示生境被物种隔离的程度；②β多样性的测定值可以用来比较不同地段的生境多样性；③β多样性与α多样性一起构成了总体多样性或一定地段的生物异质性。

γ多样性描述区域或大陆尺度的多样性，是指区域或大陆尺度的物种数量，也被称为区域多样性（regional diversity）。控制γ多样性的生态过程主要为水热动态，气候和物种形成及演化的历史。主要指标为物种数（S）。γ多样性测定沿海拔梯度具有两种分布格局：偏锋分布和显著的负相关格局。


https://rdrr.io/cran/otuSummary/man/alphaDiversity.html

Invsimpson – mothur

The invsimpson calculator is the inverse of the classical Simpson diversity estimator. This parameter is preferred to other measures of alpha-diversity because it is an indication of the richness in a community with uniform evenness that would have the same level of diversity.

https://www.mothur.org/wiki/Invsimpson

Biological diversity - the great variety of life !

在探索simpson指数之前，我们需要理解几个很重要的概念：

生物多样性可以用很多种方式定量，其中两个主要的因素是丰富度（richness）和均匀度（evenness）。

1. Richness

丰富度即每个样本的物种数，样本中物种越多，样本越“丰富”。

物种丰富度从概念上讲，并不考虑（样本中）每个物种有多少个个体。它给于个体数少的物种与个体数多的数种相同的权重。因此，在某地区1朵雏菊与1000朵金凤花对丰富度的影响是一样的。

2. Evenness

均匀度即不同物种的相对丰度（abundance）,它与丰富度互相补充，相辅相成（make up）。

[译者注] 这里其实有三个概念：Richness, Evennes 和abundance。例如A组：类1有3个，类2有5个，类3有6个；B组：类1有4个，类2有4个，类3有4个。那么A组有3类，B组也有3类，所以它们的richness是一样的；A组中3个类所含个体数均不相同，而B组中3个类所含个体数相同，因此A组和B组的evennes不同；A组类1有3个，B组类1有4个，所以就类1而言B组的abundance更高。

我们对两个地区不同的野花进行取样，以此为例。第1个地区包括300朵雏菊，335朵蒲公英和365朵金凤花。第2个地区包括20朵雏菊，49朵蒲公英和931朵金凤花，如下表。两个样本丰富度相同（均有3个物种），总的个体数也相同（均为1000朵）。然而第1个地区样本的均匀度比第2个地区样本的均匀度更高。这是因为（在第1个地区）3个物种个体分布较均匀，第2个地区大多数是金凤花，仅有少数雏菊和蒲公英。因此认为样本2比样本1的多样性更低。

相比于由相似丰度的许多物种组成的群落，由一两个优势物种组成的群落具有更低的多样性。

多样性随物种丰富度和均匀度的增加而增加。Simpson指数兼顾丰富度和均匀度。

Simpson多样性指数实际上涉及三个相似的指数：

Simpson’s Index (D)

它反映的是在同一个样本中随机的抽取2个个体，这两个个体来自同一个类的概率。有以下两个版本的公式来计算simpson指数。两者不矛盾，均可接受。


n = the total number of organisms of a particular species N = the total number of organisms of all species

D值在0-1之间。0表示无限多样，1表示没有多样性。也就是说D值越大，多样性越低。这与直觉和逻辑不符，为了解决这个问题，通常会用1减去D：

Simpson’s Index of Diversity 1-D

这个值也在0-1之间，但是此时，值越大多样性越高，这就变得更直观了。这种情况下，指数代表的意义是在同一个样本中随机的抽取2个个体，这两个个体来自不同类的概率。

对于违背直觉的D值，还有另一种处理办法，即用1除以D:

Simpson’s Reciprocal Index 1 / D

1/D的最小值为1。当它为1时表示样本仅由1个物种组成。值越大，多样性越高。最大值是样本中的物种数。例如，假设一个样本中有5个物种，则1/D的最大值为5。

[译者注] 当样本中这5个物种的丰度都相等时1/D达到最大值5。大家可以通过求二阶偏导来求出极值，因非本文重点，证明从略。

以上三个指数想用哪一个取决于使用者的分析需求，但是在研究中需指明使用哪一个指标作为simpson指数！[译者注：该文作者着重强调了这一点，请注意！]

# ====================== 译文结束 =======================

这篇材料提供的案例很好，但是遗憾的是仅说明了simpson指数与evennes关系。为了进行单因素比较，作者将两组丰富度设为相同。那么如果丰富度不同呢？而且simpson指数是否与shannon指数一样与丰度无关呢？这里再举一个例子(因为各组相互独立，这里就不给生物学意义，直接上数字了，具体可查看另一篇shannon指数博文[2])：

A组：2, 4, 6, 8

B组：20, 40, 60, 80

C组：5, 5, 5, 5

D组：5, 5, 5, 5, 5

代入公式1-D计算（因为微生物16SrRNA经典流程QIIME使用的scikit库是利用这个公式计算的〔3〕），我们可以得出：

A组simpson指数为： 1-((2/20)^2+(4/20)^2+(6/20)^2+(8/20)^2) = 0.7

A组shannon指数为 1.846439（计算公式见博文[2]，下同）

B组simpson指数为： 1-((20/200)^2+(40/200)^2+(60/200)^2+(80/200)^2) = 0.7

B组shannon指数为 1.846439

C组simpson指数为： 1-((5/20)^2)*4 = 0.75

C组shannon指数为 2.0

D组simpson指数为： 1-((5/25)^2)*5 = 0.8

D组shannon指数为 2.321928

从上面的计算过程很明显看出A组和B组相等，C组和D组不相等，A组和C组也不相等。

AB组结果相同显示出在丰富度一致时simpson指数与丰度无关，它只与相对丰度（均匀度）有关。这和shannon指数一致，归根结底是因为公式中自变量都是相对丰度pi！

CD组结果不同显示出在均匀度一致时simpson指数与丰富度有关，丰富度越大，simpson指数越小。这一点也和shannon指数的情况一致，归根结底，原因在于公式中都有加和项，而且加和部分无论是simpson指数的(pi)²还是shannon指数的x*log2(x)在区间（0，1〕上均大于0（有关x*log2(x)>0, x∈（0，1〕可以查看博文〔2〕中的y= – x*log2(x)那张图）。因此，无论是shannon指数还是simpson指数每多加一项（即丰富度增加），值都会越来越小。回到抽样上来讲，当样本中每种个体数都相同时，在一个样本中随机抽取两个个体，种类越多抽到的这两个个体来自同一个种类的概率越大。

AC组显示出当丰富度相同时，样本中种类越均一，simpson指数越大，即种类越均一，随机抽取两个个体属于同一个种类的概率越大。这一点可以查看博文〔2〕中的分析过程。对应shannon指数的y = – x*log2(x)， simpson指数的y = – x²在（0，1〕间区上，也是一个斜率逐渐减小的单调递减函数。

综上，simpson和shannon指数都是均匀度和丰富度的综合指标。

〔1〕 http://www.countrysideinfo.co.uk/simpsons.htm

〔2〕 http://blog.sciencenet.cn/blog-2970729-1069399.html

〔3〕 http://scikit-bio.org/docs/latest/generated/generated/skbio.diversity.alpha.simpson.html#skbio.diversity.alpha.simpson

本文来自卢锐科学网博客。
链接地址：http://blog.sciencenet.cn/blog-2970729-1069539.html

TORMES and gapFinisher

TORMES: an automated pipeline for whole bacterial genome analysis

Narciso M Quijada, David Rodríguez-Lázaro, Jose María Eiros, Marta Hernández

Bioinformatics, Volume 35, Issue 21, 1 November 2019, Pages 4207–4212, https://doi.org/10.1093/bioinformatics/btz220

gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output

Juhana I. Kammonen ,
Olli-Pekka Smolander,
Lars Paulin,
Pedro A. B. Pereira,
Pia Laine,
Patrik Koskinen,
Jukka Jernvall,
Petri Auvinen

Published: September 9, 2019
https://doi.org/10.1371/journal.pone.0216885

ClinVAP: A reporting strategy from variants to therapeutic options

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btz924/5674039

ClinVAP: A reporting strategy from variants to therapeutic options

Abstract

Motivation

Next-generation sequencing (NGS) has become routine in oncology and opens up new avenues of therapies, particularly in personalized oncology setting. An increasing number of cases also implies a need for a more robust, automated, and reproducible processing of long lists of variants for cancer diagnosis and therapy. While solutions for the large-scale analysis of somatic variants have been implemented, existing solutions often have issues with reproducibility, scalability, and interoperability.

Results

ClinVAP is an automated pipeline which annotates, filters, and prioritizes somatic single nucleotide variants (SNVs) provided in variant call format. It augments the variant information with documented or predicted clinical effect. These annotated variants are prioritized based on driver gene status and druggability. ClinVAP is available as a fully containerized, self-contained pipeline maximizing reproducibility and scalability allowing the analysis of larger scale data. The resulting JSON-based report is suited for automated downstream processing, but ClinVAP can also automatically render the information into a user-defined template to yield a human-readable report.

Availability and Implementation

ClinVAP is available at https://github.com/PersonalizedOncology/ClinVAP

Multivariate analyses in R (PERMANOVA )

https://rpubs.com/collnell/manova

Multivariate analyses in R

By C Nell

Types of questions

Do groups differ in composition?
Does community structure vary among regions or over time?
Do environmental variables explain community patterns?
Which species are responsible for differences among groups?

Multivariate analysis of ecological communities with vegan

install.packages('vegan')
library(vegan) ##Community ecology: ordination, disversity & dissimilarities

Dataset

Bird abundances from 32 different plots (rows), 12 of which have 1 tree species (DIVERSITY = M) and 20 with 4 tree species (DIVERSITY = P).
Tree composition: there are a total of 6 possible tree species (treecomp), each signified with a letter A to F. Bird abundances are totalled according to their feeding guild (columns).

Get data from internet:

birds<-read.csv('https://raw.githubusercontent.com/collnell/lab-demo/master/bird_by_fg.csv')
trees<-read.csv('https://raw.githubusercontent.com/collnell/lab-demo/master/tree_comp.csv')

Or from your computer:

setwd("/Users/colleennell/Dropbox/Projects/Mexico/R") #change to data folder
birds<-read.csv('bird_by_fg.csv')
head(birds)

##   DIVERSITY PLOT CA FR GR HE IN NE OM
## 1         M    3  0  0  0  0  2  0  0
## 2         M    9  0  0  2  0  6  0  4
## 3         M   12  0  0  0  0  2  0  2
## 4         M   17  0  0  0  0  7  0  4
## 5         M   20  0  0  0  0  1  0  4
## 6         M   21  0  0  3  0 14  0  7

trees<-read.csv('tree_comp.csv')
head(trees)

##   PLOT comp A B C D E F row col
## 1    3    D 0 0 0 1 0 0   3   1
## 2    9    A 1 0 0 0 0 0   2   2
## 3   12    E 0 0 0 0 1 0   5   2
## 4   17    F 0 0 0 0 0 1   3   3
## 5   20    A 1 0 0 0 0 0   6   3
## 6   21    B 0 1 0 0 0 0   7   3

Questions: Is C. pentandara (B) associated with variation in bird species composition? Does feeding guild composition differ between monoculture and polyculture plots?

MANOVA (Multivariate analysis of variance)

Parametric test for differences between independent groups for multiple continuous dependent variables. Like ANOVA for many response variables. Requires variables to be fewer than number of smaples.

Is C. pentandara (B) associated with variation in bird species composition? Or D & F (both Fabaceae)?

bird.matrix<-as.matrix(birds[,3:9])##response variables in a sample x species matrix
trees$B<-as.factor(trees$B)

bird.manova<-manova(bird.matrix~as.factor(B), data=trees) ##manova test
summary(bird.manova)

##              Df  Pillai approx F num Df den Df  Pr(>F)  
## as.factor(B)  1 0.39147   2.2056      7     24 0.07027 .
## Residuals    30                                         
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Show univariate results:

summary.aov(bird.manova)

Assumptions of MANOVA

Normal distribution
Linearity
Homogeneity of variances
Homogeneity of covariances

Problem: Most ecological data is overdispersed, has many 0’s or rare species, unequal sample sizes.
Solution: Dissimilarity coefficients, permutation tests

PERMANOVA: Permutational multivariate analysis of variance

Non-paramentric, based on dissimilarities. Allows for partitioning of variability, similar to ANOVA, allowing for complex design (multiple factors, nested design, interactions, covariates). Uses permutation to compute F-statistic (pseudo-F).
Interactive app demonstrating permutation tests

Based on Legendre & Anderson (1999, Ecological Monographs) and Anderson (2001, Austral Ecology).

Null hypothesis: Groups do not differ in spread or positioni n multivaraite space.

1. Transform or standardize data

Use square root or proportions to minimize influence of most abundant groups.

bird.mat<-sqrt(bird.matrix)#square root transform
#bird.prop<-decostand(bird.matrix, method="total")

2. Calculate ecological resemblance

Quantify pairwise compositional dissimilarity between sites based on species occurances.
– Bray-Curtis dissimilarity (abundance weighted)
– Jaccard (presence/absence)
– Gower’s (non-continuous variables)

Dissimilarity: 0 = sites are indentical, 1 = sites do not share any species

Create a dissimilarity matrix:

bird.dist<-vegdist(bird.mat, method='bray')

3. perMANOVA

Do monoculture and polyculture plots differ in feeding guild composition?

set.seed(36) #reproducible results

bird.div<-adonis2(bird.dist~DIVERSITY, data=birds, permutations = 999, method="bray", strata="PLOT")
bird.div

## Permutation test for adonis under reduced model
## Terms added sequentially (first to last)
## Permutation: free
## Number of permutations: 999
## 
## adonis2(formula = bird.dist ~ DIVERSITY, data = birds, permutations = 999, method = "bray", strata = "PLOT")
##           Df SumOfSqs      F Pr(>F)   
## DIVERSITY  1  0.32857 4.1585  0.008 **
## Residual  30  2.37033                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

strata = ‘exchangeable units’ for permutation. Important for nested design.

4. Multivariate dispersion

The average distance to group centroid. Used as a measure of multivariate beta diversity.

dispersion<-betadisper(bird.dist, group=birds$DIVERSITY)
permutest(dispersion)

## 
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 999
## 
## Response: Distances
##           Df  Sum Sq   Mean Sq      F N.Perm Pr(>F)
## Groups     1 0.00369 0.0036924 0.2231    999  0.638
## Residuals 30 0.49659 0.0165530

plot(dispersion, hull=FALSE, ellipse=TRUE) ##sd ellipse

NMDS

Non-metric multi-dimensional scaling. Unconstrained ordination. See (https://jonlefcheck.net/2012/10/24/nmds-tutorial-in-r/).
The goal of NMDS is to represent the original position of communities in multidimensional space as accurately as possible using a reduced number of dimensions that can be easily visualized. NMDS uses rank orders to preserve distances among objects thus can accomodate a variety of data types.

Configure samples in 2-dimensional space:

birdMDS<-metaMDS(bird.mat, distance="bray", k=2, trymax=35, autotransform=TRUE) ##k is the number of dimensions
birdMDS ##metaMDS takes eaither a distance matrix or your community matrix (then requires method for 'distance=')

stressplot(birdMDS)

Stress: similarity of observed distance to ordination distance. < 0.15 to indidates acceptable fit.

install.packages('ggplot2') ##plotting package
library(ggplot2)

##pull points from MDS
NMDS1 <- birdMDS$points[,1] ##also found using scores(birdMDS)
NMDS2 <- birdMDS$points[,2]
bird.plot<-cbind(birds, NMDS1, NMDS2, trees)

#plot ordination
p<-ggplot(bird.plot, aes(NMDS1, NMDS2, color=DIVERSITY))+
  geom_point(position=position_jitter(.1), shape=3)+##separates overlapping points
  stat_ellipse(type='t',size =1)+ ##draws 95% confidence interval ellipses
  theme_minimal()
p

Add labels for tree species composition:

#plot ordination
p<-ggplot(bird.plot, aes(NMDS1, NMDS2, color=DIVERSITY))+
  stat_ellipse(type='t',size =1)+
  theme_minimal()+geom_text(data=bird.plot,aes(NMDS1, NMDS2, label=comp), position=position_jitter(.35))+
  annotate("text", x=min(NMDS1), y=min(NMDS2), label=paste('Stress =',round(birdMDS$stress,3))) #add stress to plot
p

Fit vectors to ordination

Which envornmental variables are correlated with the ordination?

fit<-envfit(birdMDS, bird.mat)
arrow<-data.frame(fit$vectors$arrows,R = fit$vectors$r, P = fit$vectors$pvals)
arrow$FG <- rownames(arrow)
arrow.p<-filter(arrow, P <= 0.05)

p<-ggplot(data=bird.plot, aes(NMDS1, NMDS2))+
  geom_point(data=bird.plot, aes(NMDS1, NMDS2, color=DIVERSITY),position=position_jitter(.1))+##separates overlapping points
  stat_ellipse(aes(fill=DIVERSITY), alpha=.2,type='t',size =1, geom="polygon")+ ##changes shading on ellipses
  theme_minimal()+
  geom_segment(data=arrow.p, aes(x=0, y=0, xend=NMDS1, yend=NMDS2, label=FG, lty=FG), arrow=arrow(length=unit(.2, "cm")*arrow.p$R)) ##add arrows (scaled by R-squared value)

p

Show gradient of insectivores:

ordisurf(birdMDS, bird.mat[,'IN'], bubble=TRUE)##bubble size reflects abundance of insectivores

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(x1, x2, k = 10, bs = "tp", fx = FALSE)
## 
## Estimated degrees of freedom:
## 5.6  total = 6.6 
## 
## REML score: 35.72947

Resources:
GUSTA ME – Provides several ‘wizards’ in choosing the correct statistical test, walkthrough examples of multivariate analyses, and guide to the major types of analyses.

dictionary to csv

#!/usr/bin/env python
import os,re,sys,string,commands,getopt,subprocess,glob,csv
import prettytable as pt
from os import path
#SL335752_kneaddata_paired_1_kneaddata_paired_1.fastq.gz_rep.mpa.txt_bracken.txt
#SL311013_1_kneaddata_paired_1_kneaddata_paired_1.fastq.gz_rep_mpa.txt_bracken.txt
def main():
dic = {}
unique_speciesname = []
speciesname = []
samplenames = []
for d in os.listdir(‘.’):
print(d)
a = []
a = re.split(“_kneaddata_paired_1_”,d)
if len(a) >=2:
samplename = a[0]
dic[a[0]] = {}
#os.rename(d, newname)
fh = open(d, ‘r’)
fhlines = fh.readlines()
fh.close()

for line in fhlines:
line = line.strip()
if re.search(“name”, line):
continue
else:
b = []
b = re.split(“\t”,line)
speciesname.append(b[0])
length = len(b)
dic[a[0]][b[0]] = b[length-1]
unique = [unique_speciesname.append(x) for x in speciesname if x not in unique_speciesname]
for sample_name in dic.keys():
samplenames.append(sample_name)
for name in unique_speciesname:
#print name
if dic[sample_name].has_key(name):
print_line = name + “\t” + dic[sample_name][name]
else:
dic[sample_name][name] = “0”
print_line = name + “\t” + dic[sample_name][name]

#print dic
csv_columns = [‘Species’]
for ele in samplenames:
csv_columns.append(ele)
print csv_columns
#csv_columns = [‘No’,’Name’,’Country’]
csv_file = “sample_species.csv”
#{‘SL311014′: {‘Acinetobacter sp. WCHA55′: ‘0.00029’, ‘Streptococcus sp. oral taxon 431′: ‘0.00007’, ‘Bacillus velezensis': ‘0.00001’, ‘Ahniella affigens': ‘0.00003’, ‘Arsenicicoccus sp. oral taxon 190′: ‘0.00077’, ‘Aureimonas sp. AU20′: ‘0’, ‘Mycobacterium sp. MS1601′: ‘0’, ‘Acinetobacter sp. WCHAc010052′: ‘0.00003’, ‘Candidatus Micrarchaeota archaeon Mia14′: ‘0.00000’, ‘Pseudomonas sp. MT-1′: ‘0.00005’, ‘Halothece sp. PCC 7418′: ‘0.00001’}, ‘SL311013′: {‘Acinetobacter sp. WCHA55′: ‘0’, ‘Streptococcus sp. oral taxon 431′: ‘0.00009’, ‘Bacillus velezensis': ‘0.00004’, ‘Candidatus Micrarchaeota archaeon Mia14′: ‘0’, ‘Ahniella affigens': ‘0.00002’, ‘Arsenicicoccus sp. oral taxon 190′: ‘0.00059’, ‘Mycobacterium sp. MS1601′: ‘0.00029’, ‘Acinetobacter sp. WCHAc010052′: ‘0.00001’, ‘Aureimonas sp. AU20′: ‘0.00026’, ‘Pseudomonas sp. MT-1′: ‘0.00003’, ‘Halothece sp. PCC 7418′: ‘0.00001’}, ‘SL311012′: {‘Acinetobacter sp. WCHA55′: ‘0’, ‘Streptococcus sp. oral taxon 431′: ‘0.00008’, ‘Bacillus velezensis': ‘0.00001’, ‘Candidatus Micrarchaeota archaeon Mia14′: ‘0’, ‘Ahniella affigens': ‘0.00001’, ‘Arsenicicoccus sp. oral taxon 190′: ‘0.00054’, ‘Mycobacterium sp. MS1601′: ‘0.00026’, ‘Acinetobacter sp. WCHAc010052′: ‘0.00002’, ‘Aureimonas sp. AU20′: ‘0.00024’, ‘Pseudomonas sp. MT-1′: ‘0.00003’, ‘Halothece sp. PCC 7418′: ‘0.00000’}}
dict_data_mine = []
#dic_ele = {}
for name in unique_speciesname:
dic_ele = {}
dic_ele[“Species”]=name
for key in dic.keys():
#dic_ele[“speciesname”]=>name
dic_ele[key]=dic[key][name]
dict_data_mine.append(dic_ele)
dict_data = [
{‘No': 1, ‘Name': ‘Alex’, ‘Country': ‘India’},
{‘No': 2, ‘Name': ‘Ben’, ‘Country': ‘USA’},
{‘No': 3, ‘Name': ‘Shri Ram’, ‘Country': ‘India’},
{‘No': 4, ‘Name': ‘Smith’, ‘Country': ‘USA’},
{‘No': 5, ‘Name': ‘Yuva Raj’, ‘Country': ‘India’},
]

#print dict_data_mine
try:
with open(csv_file, ‘w’) as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns)
writer.writeheader()
for data in dict_data_mine:
writer.writerow(data)
except IOError:
print(“I/O error”)
if __name__== ‘__main__':
main()

batch script in python

#!/usr/bin/env python
import os,re,sys,string,commands,getopt,subprocess,glob
from os import path
#/disk/rdisk08/mhyleung/loreal_shotgun2/NovaSeq_new/decontam_output_genome/paired_completed/szy_test
#LOR294C_S130_R1_001_kneaddata_paired_1_decontaminated.fastq.paired.fq.gz

#LOR294C_S130_R1_001_kneaddata_paired_1.fastq.gz
#metawrap binning -o /home/mhyleung/workspace/loreal_shotgun2/NovaSeq_new/metawrap_assembly_R2/LOR303C -t 16 -a /home/mhyleung/workspace/loreal_shotgun2/NovaSeq_new/metawrap_assembly_R2/LOR303C_S134_R1_001_kneaddata_paired/LOR303C_final_assembly.fasta –maxbin2 –metabat2 –concoct /disk/rdisk08/mhyleung/loreal_shotgun2/NovaSeq_new/decontam_output_genome/paired_completed/szy_test/LOR303C_S134_R1_001_kneaddata_paired_*.fastq
#host = subprocess.Popen([‘host’, target], stdout = subprocess.PIPE).communicate()[0]
#p1 = subprocess.Popen([‘ping’, ‘-c 2′, host], stdout=subprocess.PIPE)

def main():
for d in os.listdir(‘/disk/rdisk08/mhyleung/loreal_shotgun2/NovaSeq_new/decontam_output_genome/paired_completed/szy_test’):
#print(d)
if re.search(“_1.fastq”, d):
a = []
a = re.split(“_”,d)
b = re.split(“kneaddata_paired”, d)
outputfile = “/home/mhyleung/workspace/loreal_shotgun2/NovaSeq_new/metawrap_assembly_R2/” + a[0]
assemfasta_file = “/home/mhyleung/workspace/loreal_shotgun2/NovaSeq_new/metawrap_assembly_R2/”+a[0]+”_”+a[1]+”_”+a[2]+”_”+a[3]+”_”+a[4]+”_”+a[5]+”/”+a[0]+”_final_assembly.fasta”
fastq_file = “/disk/rdisk08/mhyleung/loreal_shotgun2/NovaSeq_new/decontam_output_genome/paired_completed/szy_test/” + b[0] +”kneaddata_paired_*.fastq”
fastq_file1 = “/disk/rdisk08/mhyleung/loreal_shotgun2/NovaSeq_new/decontam_output_genome/paired_completed/szy_test/” + b[0] +”kneaddata_paired_1.fastq”
fastq_file2 = “/disk/rdisk08/mhyleung/loreal_shotgun2/NovaSeq_new/decontam_output_genome/paired_completed/szy_test/” + b[0] +”kneaddata_paired_2.fastq”
fastq_file1_gz = “/disk/rdisk08/mhyleung/loreal_shotgun2/NovaSeq_new/decontam_output_genome/paired_completed/szy_test/” + b[0] +”kneaddata_paired_1.fastq.gz”
fastq_file2_gz = “/disk/rdisk08/mhyleung/loreal_shotgun2/NovaSeq_new/decontam_output_genome/paired_completed/szy_test/” + b[0] +”kneaddata_paired_2.fastq.gz”
command_line = ” metawrap binning -o ” + outputfile + ” -t 10 -a ” + assemfasta_file + ” –maxbin2 –metabat2 ” + fastq_file
print command_line
#tar_command_line1 = “–remove-files -zcvf ” + fastq_file1_gz + ” ” + fastq_file1
#tar_command_line2 = fastq_file2_gz + ” ” + fastq_file2
subprocess.Popen([‘gunzip’, fastq_file1_gz], stdout = subprocess.PIPE).communicate()[0]
subprocess.Popen([‘gunzip’, fastq_file2_gz], stdout = subprocess.PIPE).communicate()[0]
subprocess.Popen([‘metawrap’,’binning’,’-o’, outputfile,’-t 10′,’-a’, assemfasta_file, ‘–maxbin2′, ‘–metabat2′, fastq_file], stdout = subprocess.PIPE).communicate()[0]
subprocess.Popen([‘tar’, ‘–remove-files’, ‘-zcvf’, fastq_file1_gz, fastq_file1], stdout = subprocess.PIPE).communicate()[0]
subprocess.Popen([‘tar’, ‘–remove-files’, ‘-zcvf’, fastq_file2_gz, fastq_file2], stdout = subprocess.PIPE).communicate()[0]

work_files_path = outputfile + “/work_files”
subprocess.Popen([‘rm’,’-rf’, work_files_path], stdout = subprocess.PIPE).communicate()[0]

# os.system(‘gunzip newname’)
#for file in glob.glob(“*.fasta”):
# os.rename(“final_assembly.fasta”, newname)
#os.chdir(‘/home/mhyleung/workspace/loreal_shotgun2/NovaSeq_new/metawrap_assembly_R2/’)
#change_dir = os.system(‘pwd’)
#print change_dir
#os.chdir(change_dir)

if __name__== ‘__main__':
main()

Ubuntu中v2ray客户端配置实例

首先使用bash <(curl -L -s https://install.direct/go.sh)来快捷安装v2ray，如下：

root@vm:~# bash <(curl -L -s https://install.direct/go.sh)
Installing V2Ray v4.18.0 on x86_64
Downloading V2Ray: https://github.com/v2ray/v2ray-core/releases/download/v4.18.0/v2ray-linux-64.zip
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   608    0   608    0     0    581      0 --:--:--  0:00:01 --:--:--   581
100 10.5M  100 10.5M    0     0   172k      0  0:01:02  0:01:02 --:--:--  194k
Extracting V2Ray package to /tmp/v2ray.
Archive:  /tmp/v2ray/v2ray.zip
  inflating: /tmp/v2ray/config.json  
   creating: /tmp/v2ray/doc/
  inflating: /tmp/v2ray/doc/readme.md  
  inflating: /tmp/v2ray/geoip.dat    
  inflating: /tmp/v2ray/geosite.dat  
   creating: /tmp/v2ray/systemd/
  inflating: /tmp/v2ray/systemd/v2ray.service  
   creating: /tmp/v2ray/systemv/
  inflating: /tmp/v2ray/systemv/v2ray  
  inflating: /tmp/v2ray/v2ctl        
 extracting: /tmp/v2ray/v2ctl.sig    
  inflating: /tmp/v2ray/v2ray        
 extracting: /tmp/v2ray/v2ray.sig    
  inflating: /tmp/v2ray/vpoint_socks_vmess.json  
  inflating: /tmp/v2ray/vpoint_vmess_freedom.json  
PORT:51332
UUID:7378f6a4-790a-11e9-8f9e-2a86e4085a59
Created symlink /etc/systemd/system/multi-user.target.wants/v2ray.service → /etc/systemd/system/v2ray.service.
V2Ray v4.18.0 is installed.

然后编辑/etc/v2ray/config.json文件，如下设置：

{
  "inbounds": [{
    "port": 10808,  // SOCKS 代理端口，在浏览器中需配置代理并指向这个端口
    "listen": "127.0.0.1",
    "protocol": "socks",
    "settings": {
      "udp": true
    }
  }],
  "outbounds": [{
    "protocol": "vmess",
    "settings": {
      "vnext": [{
        "address": "server", // 服务器地址，请修改为你自己的服务器 ip 或域名
        "port": 10086,  // 服务器端口
        "users": [{ "id": "b831381d-6324-4d53-ad4f-8cda48b30811" }]
      }]
    }
  },{
    "protocol": "freedom",
    "tag": "direct",
    "settings": {}
  }],
  "routing": {
    "domainStrategy": "IPOnDemand",
    "rules": [{
      "type": "field",
      "ip": ["geoip:private"],
      "outboundTag": "direct"
    }]
  }
}

编辑完成后保存，重新启动v2ray

root@vm:~# service v2ray stop
root@vm:~# service v2ray start
root@vm:~# service v2ray status
● v2ray.service - V2Ray Service
   Loaded: loaded (/etc/systemd/system/v2ray.service; enabled; vendor preset: en
   Active: active (running) since Sat 2019-05-18 08:58:43 CST; 5s ago
 Main PID: 8025 (v2ray)
    Tasks: 7 (limit: 2311)
   CGroup: /system.slice/v2ray.service
           └─8025 /usr/bin/v2ray/v2ray -config /etc/v2ray/config.json

5月 18 08:58:43 vm systemd[1]: Started V2Ray Service.
5月 18 08:58:43 vm v2ray[8025]: V2Ray 4.18.0 (Po) 20190228
5月 18 08:58:43 vm v2ray[8025]: A unified platform for anti-censorship.
5月 18 08:58:44 vm v2ray[8025]: 2019/05/18 08:58:44 [Warning] v2ray.com/core: V2

然后Firefox设置代理如下：

设置-常规-网络设置勾选手动代理配置，在SOCKS主机中填入127.0.0.1本地IP和端口，协议勾选SOCKS_v5 建议勾选使用SOCKSv5时代理DNS

小生这厢有礼了(BioFaceBook Personal Blog)

分类

Recent Comments

链接表

ssh, scp and rsync

v2ray

Docker运行

heatmap R

alpha多样性

扩增子数据分析之多样性指数： alpha多样性

Invsimpson – mothur

TORMES and gapFinisher

TORMES: an automated pipeline for whole bacterial genome analysis

gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output

ClinVAP: A reporting strategy from variants to therapeutic options

ClinVAP: A reporting strategy from variants to therapeutic options

Abstract

Multivariate analyses in R (PERMANOVA )

Multivariate analyses in R

Types of questions

Dataset

MANOVA (Multivariate analysis of variance)

Assumptions of MANOVA

PERMANOVA: Permutational multivariate analysis of variance

1. Transform or standardize data

2. Calculate ecological resemblance

3. perMANOVA

4. Multivariate dispersion

NMDS

Add labels for tree species composition:

Fit vectors to ordination

dictionary to csv

batch script in python

Ubuntu中v2ray客户端配置实例

Archives

Meta

分类

Recent Comments

链接表

Docker运行

扩增子数据分析之多样性指数： alpha多样性

TORMES: an automated pipeline for whole bacterial genome analysis

gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output

ClinVAP: A reporting strategy from variants to therapeutic options

Abstract

Multivariate analyses in R

Types of questions

Dataset

MANOVA (Multivariate analysis of variance)

Assumptions of MANOVA

PERMANOVA: Permutational multivariate analysis of variance

1. Transform or standardize data

2. Calculate ecological resemblance

3. perMANOVA

4. Multivariate dispersion

NMDS

Add labels for tree species composition:

Fit vectors to ordination

Tags

Archives

Meta