wilcox 检验批处理示例

https://cloud.tencent.com/developer/article/1666764 #GRID_input_new_p10 group group library(doBy) #使用其中的 summaryBy() 以方便按分组计算均值、中位数 #读取数据 gene <- read.table('gene.txt', sep = '\t', row.names = 1, header = TRUE, stringsAsFactors = FALSE, check.names = FALSE) group <- read.table('group.t[......]

Read more

[…]

Get statistics for each group (such as count, mean, etc) using pandas GroupBy?


Quick Answer:

The simplest way to get row counts per group is by calling .size(), which returns a Series:

df.groupby(['col1','col2']).size() 

Usually you want this result as a DataFrame (instead of a Series) so you can do:

df.groupby(['col1', 'col2']).size().reset_index(name='counts') 

If you[……]

Read more

[…]

Making a pairwise distance matrix in pandas

This is a somewhat specialized problem that forms part of a lot of data science and clustering workflows. It starts with a relatively straightforward question: if we have a bunch of measurements for two different things, how do we come up with a single number that represents the difference between t[……]

Read more

[…]

Pandas and Sklearn

pandas isnull函数检查数据是否有缺失

pandas isnull sum with column headers

 

for col in main_df: print(sum(pd.isnull(data[col]))) 

I get a list of the null count for each column:

0 1 100 

What I’m trying to do is create a new dataframe which has the column header alongside the null count, e.[……]

Read more

[…]

微生物多样研究—差异分析

1. 随机森林模型 随机森林是一种基于决策树(Decisiontree)的高效的机器学习算法,可以用于对样本进行分类(Classification),也可以用于回归分析(Regression)。 它属于非线性分类器,因此可以挖掘变量之间复杂的非线性的相互依赖关系。通过随机森林分析,可以找出能够区分两组样本间差异关键OTU。 Feature Importance Scores表格-来源于随机森林结果

记录了各OTU对组间差异的贡献值大小。

18585978-a855cbdb5a069bb1
注:一般地,选取Mean_decrease_in_accuracy值大于0.05的OTU,作进一步分[……]

Read more

[…]

ANOSIM,PERMANOVA/Adonis,MRPP (转贴)

6634703-6fde5e9bfb6489c7

1. ANOSIM 组间相似性分析

  • 相似性分析(ANOSIM)是一种非参数检验,用来检验组间(两组或多组)的差异是否显著大于组内差异,从而判断分组是否有意义。首先利用 Bray-Curtis 算法计算两两样品间的距离,然后将所有距离从小到大进行排序, 按以下公式计算 R 值,之后将样品进行置换,重新计算 R值,R大于 R 的概率即为 P 值。

6634703-ec94fa34c56b542a6634703-01bd752421e6028e

注:图上总共有 N+1 个盒子,N 为分组数量。“Between”的盒子指代的是分组之间的差异,其他分别代表各自组 内差异。R 值范围为-1 到+1,实际中 R 值一般从 0 到 1。R 值接近 1 表示组间差异[……]

Read more

[…]

Adonis与ANOSIM检验究竟是什么?(转贴)

做微生物16S测序的时候,公司的报告里经常会给到两种检验Adonis和ANOSIM,听过t.test、wilicox、anova各种检验,那么Adonis和ANOSIM检验是什么呢

Adonis 多元方差分析

Adonis,多元方差分析,亦可称为非参数多元方差分析。其原理是利用距离矩阵(比如基于Bray-Curtis距离、Euclidean距离)对总方差进行分解,分析不同分组因素对样品差异的解释度,并使用置换检验对其统计学意义进行显著性分析。 Adonis分析结果通常如下: Index Df SumsOfSqs MeanSqs F.Model R2[……]

Read more

[…]

alpha多样性

扩增子数据分析之多样性指数: alpha多样性

多样性指数(Diversity index)和计算公式可以见: wikipedia Alpha多样性(Alpha Diversity)是对某个样品中物种多样性的分析,包含样品中的物种类别的多样性——丰富度(Richness)和物种组成多少的整体分布——均匀度(Evenness)两个因素,通常用Richness,Chao1,Shannon,Simpson,Dominance和Equitability等指数来评估样本的物种多样性。 丰富度指数 Richness, Chao1,Shannon三个指数是常用的评估丰富度[……]

Read more

[…]

Multivariate analyses in R (PERMANOVA )

https://rpubs.com/collnell/manova

Multivariate analyses in R

By C Nell

Types of questions

Do groups differ in composition? Does community structure vary among regions or over time? Do environmental variables explain community patterns? Which species are responsible for differences among g[……]

Read more

[…]

Correlation tests, correlation matrix, and corresponding visualization methods in R (forward)

https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html

Read more

[…]