The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks

SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive resource for up-to-date quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. SILVA provides a manually curated taxonomy for all three domains of life, based on representative phylogenetic trees for the small- and large-subunit rRNA […]

微生物基因组中的GC-skew(zhuantie)

如果给出两个关键词:生物信息、GC,可能很多人的第一反应是“GC含量”(GC-content)或者“CpG岛”(CpG island)吧。这两个星期开始做非编码RNA(Non-coding RNA)预测(对象是Sinorhizobium meliloti,草木樨中华根瘤菌),接触到一个以前没听说过的新的“GC理论”:GC-skew.查国内文献,几乎找不到对它的详细介绍(也没有对应的中文翻译,skew有“ 歪,偏, 斜”的意思,通过我对这个理论的理解,就把GC-skew翻译为“GC偏移”吧)。这里翻译一篇Nature上的Review,和大家分享一下。

微生物基因组中的GC-skew 在大多数细菌基因组中,我们注意到前导链(leading strand)和滞后链(lagging strand)在碱基组成上存在很明显的不同——前导链富含G和T,而滞后链中的A和C更多一些。打破A=T和C=G的碱基频率发生的偏移,被称之为“AT偏移(AT-skew)”和“GC偏移(GC-skew)”。由于通常GC偏移比AT偏移发生的更明显,所以我们更多地只考虑GC偏移。衡量GC偏移的一个方法是延基因序列做一个滑动窗口(sliding window),计算(G-C)/(G+C)的值并绘图。这个公式给出了G超过C的百分比含量——值为正,则代表的是前导链;值为负,则为滞后链。 (图片来源:Nature.com) 是什么引起了GC偏移呢?我们对此还知之甚少。可能是因为前导链和滞后链在以单链DNA(single-stranded DNA)形态进行复制的时候两者花费的时间不同,所以易受不同的突变压力影响,从而导致暴露在不同的DNA受损环境之中。由于T-G和G-T的碱基互补配对错位(mispair)多于C-A和A-C,所以更容易出错的链(error-prone strand)可能相对地富含G和T.另一个理论依托于胞嘧啶脱氨水解(hydrolytic deamination of cytosine),这一过程显著地发生在单链DNA之中。复制叉(Replication fork)的非对称结构使得滞后链模板产生暂时性单链,使之更容易发生胞嘧啶脱氨。胞嘧啶脱氨导致生成尿嘧啶,其在复制过程中和鸟嘌呤互补配对,实质是引起了C到T的突变。因此,C到T的脱氨基作用将增加那条链中G和T的百分比含量和其互补链中的C和A的百分比含量。 为什么分析GC偏移很重要呢?因为GC偏移在前导链中是正值而在滞后链中为负值,所以GC偏移值是前导链起点、终点以及转变成滞后链的信号,反之亦然。这使得GC偏移成为在环状染色体(circular chromosomes)中标记起点和终点的一个有用的工具。曲线图中显而易见的局部的变化,可以标记出例如近来反向序列的重组或者与外源DNA的同化。DNA的丢失不会造成GC偏移曲线基本形状的改变,尽管和外部DNA新近的合成可能将会对局部方差产生影响。 实际上,GC偏移的可视化会遭受局部波动的影响。所以最好利用GC偏移的累积量,其值是计算序列中任意某一起点到指定点中相邻滑动窗口GC偏移值的总和。图中所示为Wolinella succinogenes DSM1740基因组的GC偏移值和GC偏移累加值,并表明了GC偏移值如何改变了复制起点和终点的信号。GC偏移累加值分别在这些位置上标记出了最大值和最小值。

文章来源:http://www.nature.com/nrmicro/journal/v2/n11/box/nrmicro1024_BX1.html

[…]

RDP Tutorials (16s Analysis)

Contents

 

Workflows:

Processing 16S rRNA data using a unsupervised method

Processing 16S rRNA data using a supervised method

Processing functional gene data using a supervised method

Individual tools:

Using the Pipeline Initial Process

Align 16S rRNA sequences using Infernal Aligner

Using the RDP Classifier

Using the RDP MultiClassifier

Performing Complete Linkage Clustering

–Using the […]

454 pyrosequencing analysis pipeline

mothur > sffinfo(sff=454Reads_archaea.sff, flow=T) Extracting info from 454Reads_archaea.sff … 10000 20000 30000 40000 50000 60000 70000 80000 90000 92115 It took 68 secs to extract 92115. Output File Names: 454Reads_archaea.fasta 454Reads_archaea.qual 454Reads_archaea.flow

mothur > trim.flows(flow=454Reads_archaea.flow, oligos=oligos_LXY.txt, pdiffs=2, bdiffs=1, processors=2) Appending files from process 15674

Output File Names: 454Reads_archaea.trim.flow 454Reads_archaea.scrap.flow 454Reads_archaea.GZ_ARC.flow 454Reads_archaea.GZ1122_ARC.flow 454Reads_archaea.GZ1122cellulose_ARC.flow 454Reads_archaea.GZ_xylan_ARC.flow 454Reads_archaea.GZ_cellulose55_ARC.flow 454Reads_archaea.SHX_xylan_ARC.flow […]

MetaPhlAn: Metagenomic Phylogenetic Analysis

MetaPhlAn is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn relies on unique clade-specific marker genes identified from 3,000 reference genomes, allowing:

up to 25,000 reads-per-second (on one CPU) analysis speed (orders of magnitude faster compared to existing methods); unambiguous taxonomic assignments as the MetaPhlAn markers are […]

DySC: software for greedy clustering of 16S rRNA reads

Summary: Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the […]