用DESeq进行差异分析的源代码

要保证当前文件夹下面有了742KO1.count等4个文件,就是用htseq对比对的bam文件进行处理后的输出文件

library(DESeq) #加载数据 K1=read.table(“742KO1.count”,row.names=1) K2=read.table(“743KO2.count”,row.names=1) W1=read.table(“740WT1.count”,row.names=1) W2=read.table(“741WT2.count”,row.names=1) #列名 data=cbind(K1,K2,W1,W2) #如果是htseq的结果,则删除data最后四行 n=nrow(data) data=data

[c language=”(-n+4:-n),”][/c]

#如果是bedtools的结果,取出统计个数列和行名 kk1=cbind(K1$V5) rownames(kk1)=rownames(K1) K1=kk1

#差异分析 colnames(data)=c(“K1″,”K2″,”W1″,”W2″) type=rep(c(“K”,”W”),c(2,2)) de=newCountDataSet(data,type) de=estimateSizeFactors(de) de=estimateDispersions(de) res=nbinomTest(de,”K”,”W”)

#res就是我们的表达量检验结果

到这里,理论上差异基因的分析已经结束啦!后面只是关于R的bioconductor包的一些简单结合使用而已

library(org.Mm.eg.db)

tmp=select(org.Mm.eg.db, keys=res$id, columns=c(“ENTREZID”,”SYMBOL”), keytype=”ENSEMBL”)

#合并res和tmp res=merge(tmp,res,by.x=”ENSEMBL”,by.y=”id”,all=TRUE)

#go tmp=select(org.Mm.eg.db, keys=res$ENSEMBL, columns=”GO”, keytype=”ENSEMBL”) ensembl_go=unlist(tapply(tmp[,2],as.factor(tmp[,1]),function(x) paste(x,collapse =”|”),simplify =F))

#为res加入go注释, res$go=ensembl_go[res$ENSEMBL]#为res加入一列go

#写入all——data all_res=res write.csv(res,file=”all_data.csv”,row.names =F)

uniq=na.omit(res)#删除无效基因 sort_uniq=uniq[order(uniq$padj),]#按照矫正p值排序

#写入排序后的all_data write.csv(res,file=”all_data.csv”,row.names =F)

#标记上下调基因 […]

getfasta (bedtools)

getfasta

bedtools getfasta extracts sequences from a FASTA file for each of the intervals defined in a BED/GFF/VCF file.

Tip

1. The headers in the input FASTA file must exactly match the chromosome column in the BED file.

2. You can use the UNIX fold command to set the line width of […]

How to Compare Regression Slopes

How to Compare Regression Slopes Jim Frost 13 January, 2016 9 75 83 27

If you perform linear regression analysis, you might need to compare different regression lines to see if their constants and slope coefficients are different. Imagine there is an established relationship between X and Y. Now, suppose you want to determine […]