R OTU heatmap2

source(“http://www.bioconductor.org/biocLite.R”); biocLite(“affy”); biocLite(“Biobase”); library(affy); library(Biobase);

>bac_4sampledata=read.csv(“/home/R_heatmap/4sample_R_cluster.csv”, sep=”\t”) > row.names(bac_4sampledata)<-bac_4sampledata$Group > bac_4sample_Datamatrix<-data.matrix(bac_4sampledata[,2:5]) > heatmap.2(bac_4sample_Datamatrix, distfun=dist,col=greenred(256), scale=”row”, key=TRUE, symkey=FALSE, density.info=”none”, trace=”none”, cexRow=0.5, cexCol=0.7,margin=c(7,30), keysize=1.5);

4sample_R_cluster_stdtop100

> heatmap.2(bac_4sample_Datamatrix, distfun = function(x) dist(x,method = ‘euclidean’),hclustfun = function(x) hclust(x,method = ‘centroid’),col=greenred(256), scale=”row”, key=TRUE, symkey=FALSE, density.info=”none”, trace=”none”, cexRow=0.5, cexCol=0.7,margin=c(7,30), keysize=1.5); […]

454 pyrosequencing analysis pipeline

mothur > sffinfo(sff=454Reads_archaea.sff, flow=T) Extracting info from 454Reads_archaea.sff … 10000 20000 30000 40000 50000 60000 70000 80000 90000 92115 It took 68 secs to extract 92115. Output File Names: 454Reads_archaea.fasta 454Reads_archaea.qual 454Reads_archaea.flow

mothur > trim.flows(flow=454Reads_archaea.flow, oligos=oligos_LXY.txt, pdiffs=2, bdiffs=1, processors=2) Appending files from process 15674

Output File Names: 454Reads_archaea.trim.flow 454Reads_archaea.scrap.flow 454Reads_archaea.GZ_ARC.flow 454Reads_archaea.GZ1122_ARC.flow 454Reads_archaea.GZ1122cellulose_ARC.flow 454Reads_archaea.GZ_xylan_ARC.flow 454Reads_archaea.GZ_cellulose55_ARC.flow 454Reads_archaea.SHX_xylan_ARC.flow […]

python dic problem

dic_cog_annot[h2[1]]={‘orthologous_group':h[3]} dic_cog_annot[h2[1]][‘protein_annot’]=h[4] 以上是增加个protein_annot键值,

如果是以下,将会被替换,只一个键值!

dic_cog_annot[h2[1]]={‘orthologous_group':h[3]} dic_cog_annot[h2[1]]={‘protein_annot':h[4]}

 

dic[name]=”NA”

dic[name] = {‘nr_annot':”NA”} dic[name][‘go_num’]=”NA” dic[name][‘go_info’] = “NA” dic[name][‘kegg_hit’] = “NA” dic[name][‘kegg_annot’]=”NA” dic[name][‘swiss_annot’]=”NA” dic[name][‘cog_hit’] =”NA” dic[name][‘orthologous_group’]=”NA” dic[name][‘protein_annot’]=”NA”

 

blast 报错问题 awk 解决

BlastOutput.iterations.E. Invalid value(s) [9] in VisibleString [AGO_S073922.1_geneid.geneid 2 exon (s) 261 – 281 6 aa, chain – incompleted …] [blastall] ERROR: 20110903163100/t1.fna.blast.m7Output BlastOutput.iterations.E. Invalid value(s) [9] in VisibleString [AGO_S020563.1_geneid.geneid 2 exon (s) 1646 – 1669 6 aa, chain + incompleted …] [blastall] ERROR: 20110903163100/t1.fna.blast.m7Output BlastOutput.iterations.E. Invalid value(s) [9] in VisibleString [AGO_S075634.1_geneid.geneid 2 exon (s) 252 […]

phrap_merge pipeline

sed ‘/NUCMER/’d Kall_usnused.delta > Kall_usnused.delta1 sed ‘/MD0\/work/’d Kall_usnused.delta1 > Kall_usnused.delta2

cat – all.delta.filter2 <<< “NUCMER” > all.delta.filter3

cat – all.delta.filter3 <<< “/MD0/work/aphis_phrap/aphisfinish_K31_up200.fna /MD0/work/aphis_phrap/20110718105830/t10.fna ” > all.delta.filter4

show-coords -clorH 454AllContigs_s_unal1_378.contigs.delta.filter > 454AllContigs_s_unal1_378.contigs.delta.filter.cootds3

./get_phrap_pair.py -i total.coords3 -j total.aphis.fna -o aphis.pair.info &

./singleLinkageClustering.pl aphis.pair.info aphis.pair.info.clus

./split_clus_file.pl <dir> <clus_file> <total_contigfile> <clus_splitsize>

My amoA work

aa

序列长度分布直方图

step1:通过mothur 中的summary.seqs 可以很方便获取长度分布信息

mothur > summary.seqs(fasta=AMIgene_11a.pep) Start End NBases Ambigs Polymer NumSeqs Minimum: 1 20 20 16 4 1 2.5%-tile: 1 55 55 42 8 35 25%-tile: 1 144 144 112 14 350 Median: 1 242 242 189 17 699 75%-tile: 1 380 380 293 21 1048 97.5%-tile: 1 828 828 646 33 1363 Maximum: […]

blast2go 本地化数据库安装运行 以及简单在线调用

最近需要对预测到的基因进行个注释工作,着手进行blast2go的工作:

最简单方式:运用官网的免费在线调用数据库方式,(需要安装好JAVA Java Runtime Environment (JRE) from http://www.java.com/download)

步骤如下:

(1)进入官网http://www.blast2go.com/b2glaunch/start-blast2go

选择相应大小的内存,点击here,如未能直接在线运行,则会让你保存并下载blast2go.jnlp 文件。

(2)然后直接在命令行运行 javaws blast2go.jnlp 回车即可出现界面,剩下的就是简单点击界面和运行了!

##############################################

本地化数据库命令行运行方式:

B2G4PIPE – Blast2GO without graphical interface

1.从http://www.blast2go.com/b2glaunch/resources

下载相应资源

http://www.blast2go.com/data/blast2go/b2g4pipe_v2.5.zip

http://www.blast2go.com/data/blast2go/local_b2g_db_tutorial_0809.zip

下载b2g database所需文件:

http://archive.geneontology.org/latest-full/go-assocdb-data.gz

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz

ftp://ftp.pir.georgetown.edu/databases/idmapping/idmapping.tb.gz

(可选,依据mysql版本)

替换b2g_db.sql 里的TYPE=MyISAM 为 ENGINE=MyISAM

同上替换go_201110-assocdb-data里:

sed -i ‘s/TYPE=MyISAM/ENGINE=MyISAM, DEFAULT CHARACTER SET latin1/’ go_201110-assocdb-data

2. 编辑后运行tutorial 里download_and_install.sh 或像下面这样手工运行:

3. 编辑并运行b2g_db.sql:

[…]

gene prediction pipeline

augustususe EST file blat -minIdentity=92 aphis_genome_scaffold_v3.1.fa aphis-unigene.idx.fna aphis-unigene.idx.psl /home/soft/pslCDnaFilter -maxAligns=1 aphis-unigene.idx.psl aphis-unigene.idx.f.psl /home/soft/augustus.2.5/scripts/blat2hints.pl –in=aphis-unigene.idx.f.psl –out=hints.E.gff./augustusRun.pl /public/d_aphis/DNA_assembly/3.1fix/gene_predict/augustus aphis_genome_scaffold_v3.1.fa pea_aphid hints.E.gff 15./augustusRun.py -i aphis_genome_scaffold_v3.1.fa -n pea_aphid -j hints.E.gff -s 35000000 &# —– prediction on sequence number 3 (length = 1746996, name = AGO_S000003) —–geneid:./scripts/gffftablExport.pl aphis.geneid.model.gff3 aphis.geneid geneidblast2go-rw-r–r– 1 zhaoqy users 90672652 2010-10-15 gene2go-rw-r–r– 1 zhaoqy users 19294682 […]

My PROJECT

Amoa project

A2,A3 amoa sample sequencing

(1) use geneious to deal with the sequence raw data, get 635bp DNA sequence

(2)use mafft to get the alignment results

(3)mothur deal pipline

mothur > dist.seqs(fasta=AMOA_A2A3.mafft.align.shortname.fasta, output=lt)

0 0 59 0

Output File Name: AMOA_A2A3.mafft.align.shortname.phylip.dist

It took 0 to calculate the distances for 60 sequences.

mothur > cluster(phylip=AMOA_A2A3.mafft.align.shortname.phylip.dist, […]