testd

Genome annotation and Pangenome analysis

In this demo we will explore how to determine a pangenome from a collection of isolate genome sequences in fasta format

This demo relies on two pieces of software, Prokka and Roary, so please remember to cite them if you end up publishing results obtained with these tools

Obtaining data

For details on obtaining Prokka and Roary, please visit their GitHub repos here and here.

Assuming you have Prokka and Roary installed and in your PATH variable, go ahead and download the six Listeria monocytogenes genomes we are going to use for this demo. From the Terminal:

    wget https://raw.githubusercontent.com/CBIBUNAB/tutorials/master/genomes/GCA_000008285.1_ASM828v1_genomic.fna
    wget https://raw.githubusercontent.com/CBIBUNAB/tutorials/master/genomes/GCA_000021185.1_ASM2118v1_genomic.fna
    wget https://raw.githubusercontent.com/CBIBUNAB/tutorials/master/genomes/GCA_000026705.1_ASM2670v1_genomic.fna
    wget https://raw.githubusercontent.com/CBIBUNAB/tutorials/master/genomes/GCA_000196035.1_ASM19603v1_genomic.fna
    wget https://raw.githubusercontent.com/CBIBUNAB/tutorials/master/genomes/GCA_000168635.2_ASM16863v2_genomic.fna
    wget https://raw.githubusercontent.com/CBIBUNAB/tutorials/master/genomes/GCA_000168815.1_ASM16881v1_genomic.fna

You should get something like the following:

genomes

These genomes correspond to isolates of L. monocytogenes reported in Probing the pan-genome of Listeria monocytogenes: new insights into intraspecific niche expansion and genomic diversification PMID: 20846431.

We selected the following six genomes based on their level of completeness (finished; contigs, etc) and their genotype (type I-III)

Genome Assembly Genome Accession Genotype Sequenced by Status
GCA_000026705 FM242711 type I Institut_Pasteur Finished
GCA_000008285 AE017262 type I TIGR Finished
GCA_000168815 AATL00000000 type I Broad Institute 79 contigs
GCA_000196035 AL591824 type II European Consortium Finished
GCA_000168635 AARW00000000 type II Broad Institute 25 contigs
GCA_000021185 CP001175 type III MSU Finished

Annotating genomes

By annotating the genomes we mean to add information regarding genes, their location, strandedness, and features and attributes. Now that you have the genomes, we need to annotate them to determine the location and attributes of the genes contained in them. We will use Prokka because it’s extremely fast and it performs well, and also because the features file that produces (GFF3) is compatible with Roary.

    prokka --kingdom Bacteria --outdir prokka_GCA_000008285 --genus Listeria --locustag GCA_000008285 GCA_000008285.1_ASM828v1_genomic.fna

Make sure you annotate the six genomes by replacing the -outdir and -locustag and fasta file accordingly. It should take ~ 4 minutes per genome in a standard laptop computer.

You should end up with 11 files including a .gff file.

Prokka output

I’m copying a description of the output files from the Prokka documentation here, but please check with the developers for in-depth documentation.

Output Files

Extension Description
.gff This is the master annotation in GFF3 format, containing both sequences and annotations. It can be viewed directly in Artemis or IGV.
.gbk This is a standard Genbank file derived from the master .gff. If the input to prokka was a multi-FASTA, then this will be a multi-Genbank, with one record for each sequence.
.fna Nucleotide FASTA file of the input contig sequences.
.faa Protein FASTA file of the translated CDS sequences.
.ffn Nucleotide FASTA file of all the annotated sequences, not just CDS.
.sqn An ASN1 format “Sequin” file for submission to Genbank. It needs to be edited to set the correct taxonomy, authors, related publication etc.
.fsa Nucleotide FASTA file of the input contig sequences, used by “tbl2asn” to create the .sqn file. It is mostly the same as the .fna file, but with extra Sequin tags in the sequence description lines.
.tbl Feature Table file, used by “tbl2asn” to create the .sqn file.
.err Unacceptable annotations – the NCBI discrepancy report.
.log Contains all the output that Prokka produced during its run. This is a record of what settings you used, even if the –quiet option was enabled.
.txt Statistics relating to the annotated features found.

GFF files are the input for Roary to compute the pangenome and contain all the annotations plus the genome sequence in fasta format appended at the end.

Determining the pangenome

Let’s put all the .gff files in the same folder (e.g., ./gff) and run Roary

    roary -f ./demo -e -n -v ./gff/*.gff

Roary will get all the coding sequences, convert them into protein, and create pre-clusters. Then, using BLASTP and MCL,Roary will create clusters, and check for paralogs. Finally, Roary will take every isolate and order them by presence/absence of orthologs. The summary output is present in the summary_statistics.txt file. In our case, the results are as follows:

Genes Number
Core genes (99% <= strains <= 100%) 2010
Soft core genes (95% <= strains < 99%) 0
Shell genes (15% <= strains < 95%) 2454
Cloud genes (0% <= strains < 15%) 0
Total genes 4464

Additionally, Roary produces a gene_presence_absence.csv file that can be opened in any spreadsheet software to manually explore the results. In this file, you will find information such as gene name and gene annotation, and, of course, whether a gene is present in a genome or not.

We already have a phylogeny that represents the evolutionary history of this six isolates, where they form clades according to their genotype, i.e., type I isolates together, and so on.

phylogeny

Roary comes with a python script that allows you to generate a few plots to graphically assess your analysis output. Try issuing the following command:

    python roary_plots.py core_gene_alignment.nwk gene_presence_absence.csv

You should get three files: a pangenome matrix, a frequency plot, and a pie chart.

matrixfrequency pie

Pangenome sequence analysis

We have already Genome annotation and Pangenome analysis, but if you wanna know the sequence of a gene in particular in the pangenome you have to search by your own the sequence in the .ffn files. To avoid this inconvenient, Enzo Guerrero-Araya wrote a script in Python3 that make csv files of all loci in the pangenome. The csv’s files can be imported to a database like Sqlite3.

Let’s put all the .ffn files in the same folder (e.g., ./ffn) and run DBGenerator.py in the same directory where is thegene_presence_absence.csv file.

python3 GeneratorDB.py ffn

The script in this version will generate 4 csv files:

Files Description
genomas_locus.csv It contains 2 columns [name of genome, name of locus]
pangenoma.csv It contains all the information of the annotation that Roary reported in thegene_presence_absence.csv file
pangenoma_locus.csv It contains 2 columns [name of gene, name of locus]
locus_sequence.csv It contains 2 columns [name of locus, nucleotide sequence]

Now we have all csv files for make our own database, in terminal you have to type:

sqlite3 database.sqlite

In the sqlite3 prompt rum:

create table genomas_locus (cod text, locus text);
create table pangenoma (gene text, non_unique_gene_name text, annotation text, no_isolates integer, no_sequences integer, avg_sequences_per_isolate integer, genome_fragment integer, order_within_fragment integer, accessory_fragment integer, accessory_order_with_fragment integer, qc text, min_group_size_nuc integer, max_group_size_nuc integer, avg_group_size_nuc integer);
create table pangenoma_locus (gene text, locus text);
create table locus_sequence (locus text, sequence text);

.separator '|'
.import genomas_locus.csv genomas_locus
.import pangenoma.csv pangenoma
.import pangenoma_locus.csv pangenoma_locus
.import locus_sequence.csv locus_sequence

create index genomas_locus_index on genomas_locus(cod, locus);
create index pangenoma_index on pangenoma(gene, non_unique_gene_name, annotation, no_isolates, no_sequences, avg_sequences_per_isolate, genome_fragment, order_within_fragment, accessory_fragment, accessory_order_with_fragment, qc, min_group_size_nuc, max_group_size_nuc, avg_group_size_nuc);
create index pangenoma_locus_index on pangenoma_locus(gene, locus);
create index locus_sequence_index on locus_sequence(locus, sequence);

Now just we have to join tables with sql query like:

select '>'|| cod || '|' || locus_sequence.locus || '|' || pangenoma.gene || x'0a' || sequence
from locus_sequence
inner join pangenoma_locus on locus_sequence.locus = pangenoma_locus.locus
inner join pangenoma on pangenoma_locus.gene = pangenoma.gene
inner join genomas_locus on locus_sequence.locus = genomas_locus.locus
where pangenoma.gene = 'tetC';

>GCA_000008285_01152016|GCA_000008285_02480|tetC
ATGGAAAAGAAGCGGACTCGGGCAGAAGAATTAGGAATAACTAGAAGAAAAATTTTGGATACAGCACGTGATTTATTTATGGAAAAGGGTTACCGGGCAGTTTCAACAAGAGAAATAGCTAAAATTGCCAACATTACCCAACCGGCACTATATCACCACTTTGAAGATAAAGAATCCCTATATATTGAAGTGGTTCGTGAATTGACGCAAAATATCCAAGTGGAAATGCATCCAATTATGCAAGTGACCAAAGCAAAAGAAGAACAATTACATGATATGTTAATTATGTTAATTGAGGAACATCCAACCAATATTCTATTAATGATTCACGATATTCTTAATGAAATGAAACCAGAAAATCAATTTTTACTTTATAAATTATGGCAAAAAACCTATTTGGAACCATTTCAACTATTTTTTGAGCGTCTAGAAAATGCTGGCGAATTGCGTGATGGTGTCAGTGCTGAGACTGCTGCGAGATACTGTTTGTCCACTATTAGCCCTCTTTTTTCTGGGAAAGGCAGCTTTGCGCAAAAGCAAACGACTACAGAACAAATTGATGAATTAATCAACTTAATGATGTTTGGTATATGTAAAAAAGAGGTATAA
>GCA_000021185_01152016|GCA_000021185_00131|tetC
ATGGAAAAGAAGCGGACTCGAGCAGAAGAATTAGGAATAACCAGAAGGAAAATCCTTGATACAGCAAGGGATTTATTTATGGAAAAAGGGTACCGGGCAGTCTCGACAAGAGAAATTGCTAAAATTGCCAAAATTACCCAACCAGCACTTTATCACCATTTTGAAGATAAAGAATCACTTTATATTGAAGTAGTTCGTGAATTGACGCAAAATATTCAAGTGGAAATGCACCCAATTATGCAAACGAGCAAAGCAAAAGAAGAACAACTGCATGATATGTTAATCATGTTAATTGAGGAGCATCCAACCAATATTCTGCTAATGATTCATGATATTCTTAATGAAATGAAGCCAGAAAATCAATTTTTACTTTATAAATTGTGGCAAAAAACCTATTTAGAACCATTTCAAGACTTTTTTGAGCGATTAGAAAATGCTGGCGAATTGCGTGATGGTATCAGTGCTGAGACCGCTGCGAGATACTGTTTATCCACTATTAGCCCGCTTTTTTCAGGGAAAGGTAGCTTTGCGCAAAAGCAAACGACTACAGAACAAATCGATGAATTAATCAACTTAATGATGTTTGGCATATGTAAAAAAGAGGTATAA
>GCA_000026705_01152016|GCA_000026705_02479|tetC
ATGGAAAAGAAGCGGACTCGGGCAGAAGAATTAGGAATAACTAGAAGAAAAATTTTGGATACAGCACGTGATTTATTTATGGAAAAGGGTTACCGGGCAGTTTCAACAAGAGAAATAGCTAAAATTGCCAACATTACCCAACCGGCACTATATCACCACTTTGAAGATAAAGAATCCCTATATATTGAAGTGGTTCGTGAATTGACGCAAAATATCCAAGTGGAAATGCATCCAATTATGCAAGTGACCAAAGCAAAAGAAGAACAATTACATGATATGTTAATTATGTTAATTGAGGAACATCCAACCAATATTCTATTAATGATTCACGATATTCTTAATGAAATGAAACCAGAAAATCAATTTTTACTTTATAAATTATGGCAAAAAACCTATTTGGAACCATTTCAACTATTTTTTGAGCGTCTAGAAAATGCTGGCGAATTGCGTGATGGTGTCAGTGCTGAGACTGCTGCGAGATACTGTTTGTCCACTATTAGCCCTCTTTTTTCTGGGAAAGGCAGCTTTGCGCAAAAGCAAACGACTACAGAACAAATTGATGAATTAATCAACTTAATGATGTTTGGTATATGTAAAAAAGAGGTATAA
>GCA_000168635_01152016|GCA_000168635_02549|tetC
ATGGAAAAGAAGCGGACTCGAGCAGAAGAATTAGGAATAACTAGAAGAAAAATTTTGGATACAGCACGTGATTTATTTATGGAAAAGGGTTACCGGGCAGTTTCTACAAGAGAAATAGCTAAAATTGCTAACATTACCCAACCGGCACTTTATCATCACTTTGAAGATAAAGAATCCCTATATATTGAAGTGGTTCGTGAATTGACGCAAAATATCCAGGTGGAAATGCATCCAATTATGCAAACGAACAAAGCAAAAGAAGAACAATTACATGATATGTTAATTATGTTAATTGAGGAACATCCCACCAATATTCTATTAATGATTCACGATATTCTTAATGAAATGAAACCAGAGAATCAATTTTTACTTTATAAATTATGGCAAAAAACCTATTTAGAACCATTTCAACAATTTTTTGAGCGTCTAGAAAATGCTGGTGAATTGCGTAATGGTATCAGTGCTGAGACCGCTGCAAGATACTGTTTGTCCACTATTAGCCCTCTTTTTTCAGGGAAAGGTAGCTTTGCGCAAAAGCAAACGACTACAGAACAAATCGATGAATTAATCAACTTAATGATGTTTGGCATATGTAAAAAAGAGGTATAA
>GCA_000168815_01152016|GCA_000168815_01572|tetC
ATGGAAAAGAAGCGGACTCGGGCAGAAGAATTAGGAATAACTAGAAGAAAAATTTTGGATACAGCACGTGATTTATTTATGGAAAAGGGTTACCGGGCAGTTTCAACAAGAGAAATAGCTAAAATTGCCAACATTACCCAACCGGCACTATATCACCACTTTGAAGATAAAGAATCCCTATATATTGAAGTGGTTCGTGAATTGACGCAAAATATCCAAGTGGAAATGCATCCAATTATGCAAGTGACCAAAGCAAAAGAAGAACAATTACATGATATGTTAATTATGTTAATTGAGGAACATCCAACCAATATTCTATTAATGATTCACGATATTCTTAATGAAATGAAACCAGAAAATCAATTTTTACTTTATAAATTATGGCAAAAAACCTATTTGGAACCATTTCAACTATTTTTTGAGCGTCTAGAAAATGCTGGCGAATTGCGTGATGGTGTCAGTGCTGAGACTGCTGCGAGATACTGTTTGTCCACTATTAGCCCTCTTTTTTCTGGGAAAGGCAGCTTTGCGCAAAAGCAAACGACTACAGAACAAATTGATGAATTAATCAACTTAATGATGTTTGGTATATGTAAAAAAGAGGTATAA
>GCA_000196035_01152016|GCA_000196035_02552|tetC
ATGGAAAAGAAGCGGACTCGAGCAGAAGAATTAGGAATAACTAGAAGAAAAATTTTGGATACAGCACGTGATTTATTTATGGAAAAGGGTTACCGGGCAGTTTCTACAAGAGAAATAGCTAAAATTGCCAACATTACCCAACCGGCACTGTATCATCACTTTGAAGATAAAGAATCCCTATATATTGAAGTGGTTCGTGAATTGACGCAAAATATCCAGGTGGAAATGCATCCAATTATGCAAACGAACAAAGCAAAAGAAGAACAATTACATGATATGTTAATTATGTTAATTGAGGAACATCCCACCAATATTCTATTAATGATTCACGATATTCTTAATGAAATGAAACCAGAGAATCAATTTTTACTTTATAAATTATGGCAAAAAACCTATTTAGAACCATTTCAACAATTTTTTGAGCGTCTAGAAAATGCTGGTGAATTGCGTAATGGTATCAGTGCTGAGACCGCTGCAAGATACTGTTTGTCCACTATTAGCCCTCTTTTTTCAGGGAAAGGTAGCTTTGCGCAAAAGCAAACGACTACAGAACAAATCGATGAATTAATCAACTTAATGATGTTTGGCATATGTAAAAAAGAGGTATAA

And thats its all. we get all sequences in fasta format of tetC gene.

Citation

Seemann T.
Prokka: rapid prokaryotic genome annotation
Bioinformatics 2014 Jul 15;30(14):2068-9.
PMID:24642063

Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill.
Roary: Rapid large-scale prokaryote pan genome analysis
Bioinformatics 2015 Jul 20. pii: btv421
PMID: 26198102

Evo-Ed: The Evolution of Citrate Use in E. coli

ng the Cit+ E. coli access to the energy-rich citrate resource.

via Evo-Ed: The Evolution of Citrate Use in E. coli.

linux 依赖关系冲突 解决办法

用aptitude 工具可以搞定
先:sudo apt-get install aptitude
然后:
sudo aptitude install python-dev
下列“新”软件包将被安装。
python-dev python2.7-dev{ab}
0 个软件包被升级,新安装 2 个, 0 个将被删除, 同时 0 个将不升级。
需要获取 29.5 MB 的存档。 解包后将要使用 39.3 MB。
下列软件包存在未满足的依赖关系:
python2.7-dev : 依赖: python2.7 (= 2.7.3-0ubuntu3) 但是 2.7.3-0ubuntu3.1 已安装。
依赖: libpython2.7 (= 2.7.3-0ubuntu3) 但是 2.7.3-0ubuntu3.1 已安装。
依赖: libexpat1-dev 但它将不会被安装。
依赖: libssl-dev 但它将不会被安装。
下列动作将解决这些依赖关系:

保持 下列软件包于其当前版本:
1) python-dev [未安装的]
2) python2.7-dev [未安装的]

是否接受该解决方案?[Y/n/q/?] n ###attention ! choose “no” here
下列动作将解决这些依赖关系:

安装 下列软件包:
1) libexpat1-dev [2.0.1-7.2ubuntu1 (precise)]
2) libssl-dev [1.0.1-4ubuntu3 (precise)]
3) libssl-doc [1.0.1-4ubuntu3 (precise)]

降级 下列软件包:
4) libexpat1 [2.0.1-7.2ubuntu1.1 (now) -> 2.0.1-7.2ubuntu1 (precise)]
5) libpython2.7 [2.7.3-0ubuntu3.1 (now) -> 2.7.3-0ubuntu3 (precise)]
6) libssl1.0.0 [1.0.1-4ubuntu5.7 (now) -> 1.0.1-4ubuntu3 (precise)]
7) python2.7 [2.7.3-0ubuntu3.1 (now) -> 2.7.3-0ubuntu3 (precise)]
8) python2.7-minimal [2.7.3-0ubuntu3.1 (now) -> 2.7.3-0ubuntu3 (precise)]

是否接受该解决方案?[Y/n/q/?] y
下列软件包将被“降级”:
libexpat1 libpython2.7 libssl1.0.0 python2.7 python2.7-minimal
下列“新”软件包将被安装。
libexpat1-dev{a} libssl-dev{a} libssl-doc{a} python-dev python2.7-dev{a}
0 个软件包被升级,新安装 5 个, 5 个被降级, 0 个将被删除, 同时 0 个将不升级。
需要获取 39.0 MB 的存档。 解包后将要使用 47.8 MB。
您要继续吗?[Y/n/?]Y

RNAmmer | The Devil in the Details

find ribosomal RNAs in genomic sequences

via RNAmmer | The Devil in the Details.

find ribosomal RNAs in genomic sequences

BMC Genomics | Full text | Comparative genomic analysis of Klebsiella pneumonia LCT-KP214 and a mutant strain LCT-KP289 obtained after spaceflight

BMC Genomics | Full text | Comparative genomic analysis of Klebsiella pneumonia LCT-KP214 and a mutant strain LCT-KP289 obtained after spaceflight.

Amber的分子动力学模拟 (转贴)

amber进行MD的步骤如下:PDB文件需要去掉结晶水、氢 和金属离子(非必须)。
网上教程:http://enzyme.fbb.msu.ru/Tutorials/

金属酶的模拟:http://ambermd.org/tutorials/advanced/tutorial1_orig/
蛋白和配体的模拟:

1:分子系统的准备

   命令如下:
tleap                         //进入leap
source  leaprc.ff99SB         //加载蛋白力场
source  leaprc.gaff           //如果有小分子加载小分子力场
loadamberparams lig.frcmod //特殊情况还要加载其他特殊的力场参数
mol = loadpdb  protein.pdb    //定义mol变量并加载蛋白分子
bond mol.1.SG mol.6.SG        //如果要二硫键。
check mol                     //检查蛋白
saveamberparm mol protein_vac.top  protein_vac.crd  //生成真空拓扑、坐标文件
solvatebox mol  TIP3PBOX 10.0     //此为方形,八面体用solvateoct
charge mol                     //检测系统电荷

    addions mol Na+  1             //加入+电荷
    addions mol CL-  1             //加入-电荷
    saveamberparm mol protein_sol.top  protein_sol.crd  //生成拓扑、坐标文件
    quit                           //退出 leap
    ambpdb -p protein_sol.top <protein_sol.crd>   protein_sol.pdb //生成体系pdb便于查看。

说明:
        <1>PDB格式文件里面规定了二硫键,在amber生成参数文件时候,要用bond命令将二硫键固定。
SSBOND   1 CME A  381    CME A  381
SSBOND   2 CME A  417    CME A  417
<2> 特殊配体参数文件的制备
antecharmber -i lig.pdb -fi pdb -o lig.mol2 -fo mol2 -c bcc(或者gas) -s 2
parmchk -i lig.mol2 -f mol2 -o lig.frcmod
source leaprc.gaff
LIG = loadmol2 lig.mol2

          check LIG
           loadamberparm lig.frcmod
           check LIG
           saveamberparm LIG lig.top lig.crd
          <3> 特殊配体参数文件的制备
          amber 加力场时如出现   Created a new atom named : .R<ILE 166> .A<CD1 14>
          表示amber可能不识别这些原子。可以在pdb中删除让amber自动添加。
          <4> Amber 力场
 ff10力场(parm10.dat):对ff99的各种参数补丁的集合,相当于parm99.dat+frcmod.ff03+bsc0+chi.OL3+新的离子参数+原子和残基名的修改以顺应PDB format version 3。这是目前最好的amber力场。
ff03.r1力场(parm99.dat+frcmod.ff03):ff99力场的修改版。获取电荷时通过连续介电模型表现溶剂可极化效应,修改了蛋白phi、psi骨架参数,减少了对螺旋构象的偏爱。核酸参数相对于ff99没变。ff03.r1与amber9中的ff03略有不同,那时仍用的是ff94的方法得来的碳、氮端基原子电荷,如果仍想用那时代的ff03就调用oldff/leaprc.ff03.
ff03ua力场(parm99.dat+frcmod.ff03+frcmod.ff03ua):ff03力场的united-atom版本,侧链的氢原子被united了,骨架上的氢原子和芳香环上的氢原子仍被保留。由于骨架还是全原子故骨架势参数没变,侧链上的参数因用了united故重新拟合。核酸参数完全没变,且还是全原子。
ff02力场(parm99.dat+frcmod.ff02pol.r1):ff99力场的可极化版,给原子上增加了可极化的偶极子。frcmod.ff02pol.r1是对原ff02的扭转参数的修正。
ff02EP力场(parm99EP.dat+frcmod.ff02pol.r1):ff02力场基础上给诸如氧、氮、硫原子增加了偏离原子中心的点电荷以表现孤对电子效应。据称比ff02稍好点。
ff99力场(parm99.dat):大部分参数来自ff94力场,修改了许多扭转角的参数。甘氨酸的骨架参数有问题,螺旋和延展构象的平衡性不对。而对于DNA,ff99长时间模拟中亚稳态占统治地位,即alpha和gamma二面角倾向于分别为gauche+和trans状态。虽然在RNA中也有这问题,但不严重。ff99的这些毛病在ff94里也有。
ff99SB力场(parm99.dat+frcmod.ff99SB):对ff99的蛋白二面角参数进行修正,二级结构间分布的比例得到了改善,也解决了甘氨酸骨架参数问题。
bsc0(frcmod.parmbsc0):解决上述ff99在核酸模拟问题上的补丁,同时还改进了RNA的糖苷的gamma二面角扭转势。可参考http://mmb.pcb.ub.es/PARMBSC0。
ff99SB+bsc0力场:把bsc0补丁用到ff99SB上,相对于ff99同时增进对蛋白和核酸的效果。这个组合使gamma二面角过分偏离了trans型。如果初始结构有很多gamma角为trans的情况,还是用ff99比较好。
ff99SBildn(frcmod.ff99SBildn):在ff99SB基础上修改氨基酸侧链参数的补丁。
ff99SBnmr(frcmod.ff99SBnmr):在ff99SB基础上修改骨架扭转项参数以更符合NMR数据的补丁。
ff98力场(parm98.dat):对ff94改进了糖苷的扭转角参数。
ff96力场(parm96.dat):与ff94扭转角不同,算出来的能量更接近量化结果。来自Beachy et al,由于构象有明显偏向beta等问题,使用不广泛。
ff94力场(parm94.dat):来自Cornell, Kollman et al。适合溶剂环境。电荷由RESP HF/6-31G*获得。
ff86力场(parm91X.dat):将ff84扩展为全原子力场。和ff84一样对氢键也是用Lennard-Jones 10-12势,故如果想在sander里用ff84/86,得重新带着-DHAS_10_12选项编译。之所以相应的文件叫parm91X是因为对原始ff86做了一些修正。(parm91X.dat是parm91.dat的补完版,加入了一些非键项,但非键项比如Mg、I等的参数都没调好,只是近似。)
ff84(parm91X.ua.dat):最早的AMBER力场,用于模拟核酸和蛋白质的联合原子力场。不推荐使用,但在真空或者距离依赖的介电常数下模拟还有用。
parmAM1和parmPM3力场(parmAM1.dat/parmPM3.dat):用这个参数对蛋白质优化可以得出与AM1/PM3相同的优化结果。如今已没什么价值。
GAFF力场(gaff.dat)=Generation Amber Force Field:普适型有机小分子力场,函数形式和AMBER力场相同,与AMBER力场完全兼容。
GLYCAM-06力场(GLYCAM_06g.dat):对以前GLYCAM力场做了改进,并且纳入了一小部分脂类的参数。
GLYCAM-04EP力场(GLYCAM04EP.dat):将GLYCAM04扩展到可用于TIP5P模型下的模拟。给氧加上非原子中心点电荷表现孤对电子效应。
GLYCAM-04力场(GLYCAM04.dat)=glycans and glycoconjugates in AMBER:专用于糖的模拟,和AMBER完全兼容,可一起用于糖蛋白的模拟。官网:http://glycam.ccrc.uga.edu/ccrc/index.jsp
另外,Amber程序还支持AMOEBA力场,也可以通过自带的CHAMBER工具来支持CHARMM力场,这里就不提了。

2:溶剂优化
模拟的参数文件如下:min1.in
protein: initial minimisation solvent + ions
&cntrl
imin = 1,                 //任务是优化, 0为分子动力学
maxcyc = 1000,      //优化步数
ncyc = 500,             // 前500为最陡下降 后500为共轭梯度
ntb = 1,                  // 周期边界条件 0 不采用  1 定容  2 定压

ntr = 1,                   //优化时需要一些约束原子  -ref
cut = 10                  //非键相互作用阈值为 10唉
/
Hold the protein fixed   //约束说明
500.0                            //作用在肽键上的力
RES 1 194                    //限制的残基序号
END
END
    红色部分表示限制蛋白部分。其中500.0单位是kcal/mol,表示作用在肽链上使其不动的力。“RES 1 194”表示肽链残基数目,因为我们学习使用的protein有194个残基。
    模拟命令如下:
sander –O –i min1.in –o min1.out –p protein_sol.top –c protein_sol.crd –r protein_min1.rst –ref protein_sol.crd &
3:蛋白的优化 min2.in
protein: initial minimisation whole system
&cntrl
imin = 1,
maxcyc = 2500,
ncyc = 1000,
ntb = 1,
ntr = 0,
cut = 10
/
命令如下:
sander –O –i min2.in –o min2.out –p protein_sol.top –c protein_min1.rst –r protein_min2.rst &
4: 有限分子动力学模拟
       在这个步骤中,我们将主要目的是对特定的原子使用作用力使其能量优化。我们要优化溶剂环境,至少需要10ps,我们将使用20ps用来优化我们上两步制作的分子系统的周期性边界的溶剂环境。
 命令配置文件md1.in如下:
protein: 20ps MD with res on protein
&cntrl
imin = 0,  irest = 0,   ntx = 1,           //表示模拟过程为分子动力学,不是能量最优化。
nstlim = 10000, dt = 0.002,            //表示计算的步数。
ntc = 2,             //1表示不实用使用,2表示氢键将被计算,3表示所有键都将被计算在内。
ntf = 2,
cut = 10,
ntb = 1,                  // 表示分子动力学过程保持体积固定。
ntr = 1,
tautp = 0.1,           //热浴时间常数,缺省为1.0。小的时间常数可以得到较好的耦联。
ntpr = 100,
ntwx = 500,
ntwr = 1000
ntt = 3, 
                     //温度转变控制,3表示使用兰格氏动力学
gamma_ln = 1.0,    //表示当ntt=3时的碰撞频率,单位为ps-1(请参考AMBER手册)  
tempi = 0.0,           //系统开始时的温度。
 temp0 = 300.0,     //表示最后系统到达并保持的温度,单位为K。 
 /
keep protein fixed with weak restraints
10.0
RES 1 194   //蛋白的残基数
END
END
    我们将使用一个较小的作用力,10kcal/mol。在分子动力学中,当ntr=1时,作用力只需要5-10kcal/mol(我们需要引用一个坐标文件做分子动力学过程的比较,我们需要使用”-ref”参数)。太大的作用力同时使用Shake算法和2fs步长将使整个系统变得不稳定,因为大的作用力使系统中的原子产生大频率的振动,模拟过程并步需要。

运行命令如下:

sander –O –i md1.in –o md1.out –p protein_sol.top –c protein_min2.rst –r protein_md1.rst –x protein_md1.mdcrd –ref protein_min2.rst –inf md1.info&
5、生成相分子动力学模拟
    这一步进行整个系统的分子动力学模拟,而不对某些特定原子位置进行限制。配置文件md2.in如下:
protein: 250ps MD
&cntrl
imin = 0, irest = 1, ntx = 7,
ntb = 2, pres0 = 1.0, ntp = 1,
taup = 2.0,
cut = 10, ntr = 0,
ntc = 2, ntf = 2,
tempi = 300.0, temp0 = 300.0,
ntt = 3, gamma_ln = 1.0,
nstlim = 125000, dt = 0.002,
ntpr = 100, ntwx = 500, ntwr = 1000
/
使用以下命令进行MD:
sander –O –i md2.in –o md2.out –p protein.top –c protein_md1.rst – r protein_md2.rst –x protein_md2.mdcrd –ref protein_md1.rst –inf md2.info &
6:Amber模拟数据的分析
    模拟完成后,接下来要对得到的数据进行分析。主要数据文件.out文件,包含系统能量、温度,压力等等;.mdcrd文件,是分子动力学轨迹文件,可以求系统蛋白的RMSD,回转半径等等。    

pseudopipe pipeline (predict Pseudogenes)

shenzy@shenzy-pc:~/zhanglei/pgenes/ppipe_input/prodigal/input_file$ /usr/local/RepeatModeler/BuildDatabase -name tcs2 -engine ncbi /home/shenzy/zhanglei/pgenes/ppipe_input/prodigal/input_file/tcs2_prodigal.fa
Building database tcs2:
Adding /home/shenzy/zhanglei/pgenes/ppipe_input/prodigal/input_file/tcs2_prodigal.fa to database
Number of sequences (bp) added to database: 4407 ( 4099764 bp )
shenzy@shenzy-pc:~/zhanglei/pgenes/ppipe_input/prodigal/input_file$ /usr/local/RepeatModeler/BuildDatabase -name tcs3 -engine ncbi /home/shenzy/zhanglei/pgenes/ppipe_input/prodigal/input_file/tcs3_prodigal.fa
Building database tcs3:
Adding /home/shenzy/zhanglei/pgenes/ppipe_input/prodigal/input_file/tcs3_prodigal.fa to database
Number of sequences (bp) added to database: 4430 ( 4121190 bp )
shenzy@shenzy-pc:~/zhanglei/pgenes/ppipe_input/prodigal/input_file$ /usr/local/RepeatModeler/BuildDatabase -name tcs4 -engine ncbi /home/shenzy/zhanglei/pgenes/ppipe_input/prodigal/input_file/tcs4_prodigal.fa
Building database tcs4:
Adding /home/shenzy/zhanglei/pgenes/ppipe_input/prodigal/input_file/tcs4_prodigal.fa to database

nohup /usr/local/RepeatModeler/RepeatModeler -database tcs4 >& run4.out &
/home/shenzy/lib/RepeatMasker/RepeatMasker -lib consensi.fa.classified tcs4_genome.fasta &
./pseudopipe.sh /home/shenzy/lib/pgenes/ppipe_output/caenorhabditis_elegans_62_220a/ /home/shenzy/lib/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/dna_rm.fa /home/shenzy/lib/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/Caenorhabditis_elegans.WS220.62.dna.chromosome.%s.fa /home/shenzy/lib/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/pep/Caenorhabditis_elegans.WS220.62.pep.fa /home/shenzy/lib/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/mysql/chr%s_exLocs 0

 

./pseudopipe.sh /home/shenzy/lib/pgenes/ppipe_output/caenorhabditis_elegans_62_220a /home/shenzy/lib/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/dna_rm.fa /home/shenzy/lib/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/Caenorhabditis_elegans.WS220.62.dna.chromosome.%s.fa /home/shenzy/lib/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/pep/Caenorhabditis_elegans.WS220.62.pep.fa /home/shenzy/lib/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/mysql/chr%s_exLocs 0
./pseudopipe.sh /home/shenzy/lib/pgenes/ppipe_output/prodigal/tcs4/ /home/shenzy/lib/pgenes/ppipe_input/prodigal/tcs4/dna/tcs4_genome.fasta.masked /home/shenzy/lib/pgenes/ppipe_input/prodigal/tcs4/dna/tcs4_genome.%s.fasta /home/shenzy/lib/pgenes/ppipe_input/prodigal/tcs4/pep/tcs4_prodigal.pep /home/shenzy/lib/pgenes/ppipe_input/prodigal/tcs4/mysql/contig.%s_exlocs 0

 

ubuntu 12.04 LTS 如何使用更快的更新源_百度经验

# 电子科大(教育网用户推荐)

deb http://ubuntu.uestc.edu.cn/ubuntu/ precise main restricted universe multiverse

deb http://ubuntu.uestc.edu.cn/ubuntu/ precise-backports main restricted universe multiverse

deb http://ubuntu.uestc.edu.cn/ubuntu/ precise-proposed main restricted universe multiverse

deb http://ubuntu.uestc.edu.cn/ubuntu/ precise-security main restricted universe multiverse

deb http://ubuntu.uestc.edu.cn/ubuntu/ precise-updates main restricted universe multiverse

deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise main restricted universe multiverse

deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise-backports main restricted universe multiverse

deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise-proposed main restricted universe multiverse

deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise-security main restricted universe multiverse

deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise-updates main restricted universe multiverse

# 中国科技大学(教育网用户推荐)

deb http://debian.ustc.edu.cn/ubuntu/ precise main restricted universe multiverse

deb http://debian.ustc.edu.cn/ubuntu/ precise-backports restricted universe multiverse

deb http://debian.ustc.edu.cn/ubuntu/ precise-proposed main restricted universe multiverse

deb http://debian.ustc.edu.cn/ubuntu/ precise-security main restricted universe multiverse

deb http://debian.ustc.edu.cn/ubuntu/ precise-updates main restricted universe multiverse

deb-src http://debian.ustc.edu.cn/ubuntu/ precise main restricted universe multiverse

deb-src http://debian.ustc.edu.cn/ubuntu/ precise-backports main restricted universe multiverse

deb-src http://debian.ustc.edu.cn/ubuntu/ precise-proposed main restricted universe multiverse

deb-src http://debian.ustc.edu.cn/ubuntu/ precise-security main restricted universe multiverse

deb-src http://debian.ustc.edu.cn/ubuntu/ precise-updates main restricted universe multiverse

# 北京理工(教育网用户推荐)

deb http://mirror.bjtu.edu.cn/ubuntu/ precise main multiverse restricted universe

deb http://mirror.bjtu.edu.cn/ubuntu/ precise-backports main multiverse restricted universe

deb http://mirror.bjtu.edu.cn/ubuntu/ precise-proposed main multiverse restricted universe

deb http://mirror.bjtu.edu.cn/ubuntu/ precise-security main multiverse restricted universe

deb http://mirror.bjtu.edu.cn/ubuntu/ precise-updates main multiverse restricted universe

deb-src http://mirror.bjtu.edu.cn/ubuntu/ precise main multiverse restricted universe

deb-src http://mirror.bjtu.edu.cn/ubuntu/ precise-backports main multiverse restricted universe

deb-src http://mirror.bjtu.edu.cn/ubuntu/ precise-proposed main multiverse restricted universe

deb-src http://mirror.bjtu.edu.cn/ubuntu/ precise-security main multiverse restricted universe

deb-src http://mirror.bjtu.edu.cn/ubuntu/ precise-updates main multiverse restricted universe

# 163(非教育网推荐)

deb http://mirrors.163.com/ubuntu/ precise main restricted

deb-src http://mirrors.163.com/ubuntu/ precise main restricted

deb http://mirrors.163.com/ubuntu/ precise-updates main restricted

deb-src http://mirrors.163.com/ubuntu/ precise-updates main restricted

deb http://mirrors.163.com/ubuntu/ precise universe

deb-src http://mirrors.163.com/ubuntu/ precise universe

deb http://mirrors.163.com/ubuntu/ precise-updates universe

deb-src http://mirrors.163.com/ubuntu/ precise-updates universe

deb http://mirrors.163.com/ubuntu/ precise multiverse

deb-src http://mirrors.163.com/ubuntu/ precise multiverse

deb http://mirrors.163.com/ubuntu/ precise-updates multiverse

deb-src http://mirrors.163.com/ubuntu/ precise-updates multiverse

deb http://mirrors.163.com/ubuntu/ precise-backports main restricted universe multiverse

deb-src http://mirrors.163.com/ubuntu/ precise-backports main restricted universe multiverse

deb http://mirrors.163.com/ubuntu/ precise-security main restricted

deb-src http://mirrors.163.com/ubuntu/ precise-security main restricted

deb http://mirrors.163.com/ubuntu/ precise-security universe

deb-src http://mirrors.163.com/ubuntu/ precise-security universe

deb http://mirrors.163.com/ubuntu/ precise-security multiverse

deb-src http://mirrors.163.com/ubuntu/ precise-security multiverse

deb http://extras.ubuntu.com/ubuntu precise main

deb-src http://extras.ubuntu.com/ubuntu precise main

# sohu(非教育网推荐)

deb http://mirrors.sohu.com/ubuntu/ precise main restricted

deb-src http://mirrors.sohu.com/ubuntu/ precise main restricted

deb http://mirrors.sohu.com/ubuntu/ precise-updates main restricted

deb-src http://mirrors.sohu.com/ubuntu/ precise-updates main restricted

deb http://mirrors.sohu.com/ubuntu/ precise universe

deb-src http://mirrors.sohu.com/ubuntu/ precise universe

deb http://mirrors.sohu.com/ubuntu/ precise-updates universe

deb-src http://mirrors.sohu.com/ubuntu/ precise-updates universe

deb http://mirrors.sohu.com/ubuntu/ precise multiverse

deb-src http://mirrors.sohu.com/ubuntu/ precise multiverse

deb http://mirrors.sohu.com/ubuntu/ precise-updates multiverse

deb-src http://mirrors.sohu.com/ubuntu/ precise-updates multiverse

deb http://mirrors.sohu.com/ubuntu/ precise-backports main restricted universe multiverse

deb-src http://mirrors.sohu.com/ubuntu/ precise-backports main restricted universe multiverse

deb http://mirrors.sohu.com/ubuntu/ precise-security main restricted

deb-src http://mirrors.sohu.com/ubuntu/ precise-security main restricted

deb http://mirrors.sohu.com/ubuntu/ precise-security universe

deb-src http://mirrors.sohu.com/ubuntu/ precise-security universe

deb http://mirrors.sohu.com/ubuntu/ precise-security multiverse

deb-src http://mirrors.sohu.com/ubuntu/ precise-security multiverse

deb http://extras.ubuntu.com/ubuntu precise main

deb-src http://extras.ubuntu.com/ubuntu precise main

via ubuntu 12.04 LTS 如何使用更快的更新源_百度经验.

Microbiome | Abstract | MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm

Microbiome | Abstract | MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm.