<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>小生这厢有礼了(BioFaceBook Personal Blog) &#187; metagenome</title>
	<atom:link href="https://www.biofacebook.com/?feed=rss2&#038;tag=metagenome" rel="self" type="application/rss+xml" />
	<link>https://www.biofacebook.com</link>
	<description>记录生物信息学点滴足迹（NGS,Genome,Meta,Linux)</description>
	<lastBuildDate>Sun, 23 Aug 2020 03:28:53 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.41</generator>
	<item>
		<title>The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks</title>
		<link>https://www.biofacebook.com/?p=852</link>
		<comments>https://www.biofacebook.com/?p=852#comments</comments>
		<pubDate>Fri, 06 Dec 2013 07:45:24 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[生物信息]]></category>
		<category><![CDATA[genome]]></category>
		<category><![CDATA[metagenome]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=852</guid>
		<description><![CDATA[<p>SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive resource for up-to-date quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. SILVA provides a manually curated taxonomy for all three domains of life, based on representative phylogenetic trees for the small- and large-subunit rRNA [...]]]></description>
				<content:encoded><![CDATA[<p>SILVA (from Latin silva, forest, <a href="http://www.arb-silva.de/">http://www.arb-silva.de</a>) is a comprehensive resource for up-to-date quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. SILVA provides a manually curated taxonomy for all three domains of life, based on representative phylogenetic trees for the small- and large-subunit rRNA genes. This article describes the improvements the SILVA taxonomy has undergone in the last 3 years. Specifically we are focusing on the curation process, the various resources used for curation and the comparison of the SILVA taxonomy with Greengenes and RDP-II taxonomies. Our comparisons not only revealed a reasonable overlap between the taxa names, but also points to significant differences in both names and numbers of taxa between the three resources.</p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=852</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>微生物基因组中的GC-skew(zhuantie)</title>
		<link>https://www.biofacebook.com/?p=759</link>
		<comments>https://www.biofacebook.com/?p=759#comments</comments>
		<pubDate>Mon, 29 Apr 2013 03:11:09 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[生物信息]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[metagenome]]></category>
		<category><![CDATA[NGS]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=759</guid>
		<description><![CDATA[<p>如果给出两个关键词：生物信息、GC，可能很多人的第一反应是“GC含量”(GC-content)或者“CpG岛”(CpG island)吧。这两个星期开始做非编码RNA(Non-coding RNA)预测(对象是Sinorhizobium meliloti,草木樨中华根瘤菌)，接触到一个以前没听说过的新的“GC理论”：GC-skew.查国内文献，几乎找不到对它的详细介绍（也没有对应的中文翻译，skew有“ 歪，偏， 斜”的意思，通过我对这个理论的理解，就把GC-skew翻译为“GC偏移”吧）。这里翻译一篇Nature上的Review，和大家分享一下。 </p> <p> 微生物基因组中的GC-skew 在大多数细菌基因组中，我们注意到前导链(leading strand)和滞后链(lagging strand)在碱基组成上存在很明显的不同——前导链富含G和T，而滞后链中的A和C更多一些。打破A=T和C=G的碱基频率发生的偏移，被称之为“AT偏移(AT-skew)”和“GC偏移(GC-skew)”。由于通常GC偏移比AT偏移发生的更明显，所以我们更多地只考虑GC偏移。衡量GC偏移的一个方法是延基因序列做一个滑动窗口(sliding window)，计算(G-C)/(G+C)的值并绘图。这个公式给出了G超过C的百分比含量——值为正，则代表的是前导链；值为负，则为滞后链。 （图片来源：Nature.com） 是什么引起了GC偏移呢？我们对此还知之甚少。可能是因为前导链和滞后链在以单链DNA(single-stranded DNA)形态进行复制的时候两者花费的时间不同，所以易受不同的突变压力影响，从而导致暴露在不同的DNA受损环境之中。由于T-G和G-T的碱基互补配对错位(mispair)多于C-A和A-C，所以更容易出错的链(error-prone strand)可能相对地富含G和T.另一个理论依托于胞嘧啶脱氨水解(hydrolytic deamination of cytosine)，这一过程显著地发生在单链DNA之中。复制叉(Replication fork)的非对称结构使得滞后链模板产生暂时性单链，使之更容易发生胞嘧啶脱氨。胞嘧啶脱氨导致生成尿嘧啶，其在复制过程中和鸟嘌呤互补配对，实质是引起了C到T的突变。因此，C到T的脱氨基作用将增加那条链中G和T的百分比含量和其互补链中的C和A的百分比含量。 为什么分析GC偏移很重要呢？因为GC偏移在前导链中是正值而在滞后链中为负值，所以GC偏移值是前导链起点、终点以及转变成滞后链的信号，反之亦然。这使得GC偏移成为在环状染色体(circular chromosomes)中标记起点和终点的一个有用的工具。曲线图中显而易见的局部的变化，可以标记出例如近来反向序列的重组或者与外源DNA的同化。DNA的丢失不会造成GC偏移曲线基本形状的改变，尽管和外部DNA新近的合成可能将会对局部方差产生影响。 实际上，GC偏移的可视化会遭受局部波动的影响。所以最好利用GC偏移的累积量，其值是计算序列中任意某一起点到指定点中相邻滑动窗口GC偏移值的总和。图中所示为Wolinella succinogenes DSM1740基因组的GC偏移值和GC偏移累加值，并表明了GC偏移值如何改变了复制起点和终点的信号。GC偏移累加值分别在这些位置上标记出了最大值和最小值。</p> <p>文章来源：http://www.nature.com/nrmicro/journal/v2/n11/box/nrmicro1024_BX1.html</p> [...]]]></description>
				<content:encoded><![CDATA[<p>如果给出两个关键词：生物信息、GC，可能很多人的第一反应是“GC含量”(<a href="http://en.wikipedia.org/wiki/GC-content" target="_blank">GC-content</a>)或者“CpG岛”(<a href="http://en.wikipedia.org/wiki/CpG_island" target="_blank">CpG island</a>)吧。这两个星期开始做非编码RNA(<a href="http://en.wikipedia.org/wiki/Non-coding_RNA" target="_blank">Non-coding RNA</a>)预测(对象是<em><a href="http://en.wikipedia.org/wiki/Sinorhizobium_meliloti" target="_blank">Sinorhizobium meliloti</a></em>,<a href="http://www.qikan.com.cn/Article/kjsy/kjsy200803/kjsy20080324.html" target="_blank">草木樨中华根瘤菌</a>)，接触到一个以前没听说过的新的“GC理论”：GC-skew.查国内文献，几乎找不到对它的详细介绍（也没有对应的中文翻译，skew有“ <a href="http://www.iciba.com/skew/" target="_blank">歪，偏， 斜</a>”的意思，通过我对这个理论的理解，就把GC-skew翻译为“GC偏移”吧）。这里翻译一篇<a href="http://www.nature.com/" target="_blank">Nature</a>上的<a href="http://www.nature.com/nrmicro/journal/v2/n11/box/nrmicro1024_BX1.html" target="_blank">Review</a>，和大家分享一下。<br />
<strong></strong></p>
<p><strong>                                              微生物基因组中的GC-skew</strong><br />
在大多数细菌基因组中，我们注意到前导链(leading strand)和滞后链(lagging strand)在碱基组成上存在很明显的不同——前导链富含G和T，而滞后链中的A和C更多一些。打破A=T和C=G的碱基频率发生的偏移，被称之为“AT偏移(AT-skew)”和“GC偏移(GC-skew)”。由于通常GC偏移比AT偏移发生的更明显，所以我们更多地只考虑GC偏移。衡量GC偏移的一个方法是延基因序列做一个滑动窗口(sliding window)，计算(G-C)/(G+C)的值并绘图。这个公式给出了G超过C的百分比含量——值为正，则代表的是前导链；值为负，则为滞后链。<br />
<img title="微生物基因组中的GC skew" src="http://www.nature.com/nrmicro/journal/v2/n11/images/nrmicro1024-i1.gif" alt="GC-skew" /><br />
（图片来源：<a href="http://www.nature.com/nrmicro/journal/v2/n11/box/nrmicro1024_BX1.html" target="_blank">Nature.com</a>）<br />
是什么引起了GC偏移呢？我们对此还知之甚少。可能是因为前导链和滞后链在以单链DNA(<a href="http://en.wikipedia.org/w/index.php?title=Single-stranded_DNA&amp;redirect=no" target="_blank">single-stranded DNA</a>)形态进行复制的时候两者花费的时间不同，所以易受不同的突变压力影响，从而导致暴露在不同的DNA受损环境之中。由于T-G和G-T的碱基互补配对错位(mispair)多于C-A和A-C，所以更容易出错的链(error-prone strand)可能相对地富含G和T.另一个理论依托于胞嘧啶脱氨水解(hydrolytic deamination of cytosine)，这一过程显著地发生在单链DNA之中。复制叉(<a href="http://en.wikipedia.org/wiki/Replication_fork" target="_blank">Replication fork</a>)的非对称结构使得滞后链模板产生暂时性单链，使之更容易发生胞嘧啶脱氨。胞嘧啶脱氨导致生成尿嘧啶，其在复制过程中和鸟嘌呤互补配对，实质是引起了C到T的突变。因此，C到T的脱氨基作用将增加那条链中G和T的百分比含量和其互补链中的C和A的百分比含量。<br />
为什么分析GC偏移很重要呢？因为GC偏移在前导链中是正值而在滞后链中为负值，所以GC偏移值是前导链起点、终点以及转变成滞后链的信号，反之亦然。这使得GC偏移成为在环状染色体(<a href="http://en.wikipedia.org/wiki/Circular_bacterial_chromosome" target="_blank">circular chromosomes</a>)中标记起点和终点的一个有用的工具。曲线图中显而易见的局部的变化，可以标记出例如近来反向序列的重组或者与外源DNA的同化。DNA的丢失不会造成GC偏移曲线基本形状的改变，尽管和外部DNA新近的合成可能将会对局部方差产生影响。<br />
实际上，GC偏移的可视化会遭受局部波动的影响。所以最好利用GC偏移的累积量，其值是计算序列中任意某一起点到指定点中相邻滑动窗口GC偏移值的总和。图中所示为<em>Wolinella succinogenes</em> DSM1740基因组的GC偏移值和GC偏移累加值，并表明了GC偏移值如何改变了复制起点和终点的信号。GC偏移累加值分别在这些位置上标记出了最大值和最小值。</p>
<p>文章来源：<a href="http://www.nature.com/nrmicro/journal/v2/n11/box/nrmicro1024_BX1.html" target="_blank">http://www.nature.com/nrmicro/journal/v2/n11/box/nrmicro1024_BX1.html</a></p>
<div></div>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=759</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RDP Tutorials (16s Analysis)</title>
		<link>https://www.biofacebook.com/?p=634</link>
		<comments>https://www.biofacebook.com/?p=634#comments</comments>
		<pubDate>Wed, 12 Sep 2012 07:43:23 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[二代测序]]></category>
		<category><![CDATA[生物信息]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[metagenome]]></category>
		<category><![CDATA[mothur]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=634</guid>
		<description><![CDATA[Contents <p>&#160;</p> Workflows: <p>Processing 16S rRNA data using a unsupervised method</p> <p>Processing 16S rRNA data using a supervised method</p> <p>Processing functional gene data using a supervised method</p> Individual tools: <p>Using the Pipeline Initial Process</p> <p>Align 16S rRNA sequences using Infernal Aligner</p> <p>Using the RDP Classifier</p> <p>Using the RDP MultiClassifier</p> <p>Performing Complete Linkage Clustering</p> <p>&#8211;Using the [...]]]></description>
				<content:encoded><![CDATA[<h1>Contents</h1>
<p>&nbsp;</p>
<h2>Workflows:</h2>
<p><a href="http://rdp.cme.msu.edu/tutorials/workflows/16S_unsupervised_flow.html">Processing<strong> 16S rRNA</strong> data using a <strong>unsupervised method</strong></a></p>
<p><a href="http://rdp.cme.msu.edu/tutorials/workflows/16S_supervised_flow.html">Processing<strong> 16S rRNA</strong> data using a <strong>supervised method</strong></a></p>
<p><a href="http://rdp.cme.msu.edu/tutorials/workflows/gene_unsupervised_flow.html">Processing<strong> functional gene</strong> data using a <strong>supervised method</strong></a></p>
<h2>Individual tools:</h2>
<p><a href="http://rdp.cme.msu.edu/tutorials/init_process/RDPtutorial_INITIAL-PROCESS.html">Using the<strong> Pipeline Initial Process</strong></a></p>
<p><a href="http://rdp.cme.msu.edu/tutorials/aligner/RDPtutorial_ALIGNER.html">Align 16S rRNA sequences using <strong>Infernal Aligner</strong></a></p>
<p><a href="http://rdp.cme.msu.edu/tutorials/classifier/RDPtutorial_RDP-CLASSIFIER.html">Using the <strong>RDP Classifier</strong></a></p>
<p><a href="http://rdp.cme.msu.edu/tutorials/classifier/RDPtutorial_MULTICLASSIFIER.html">Using the <strong>RDP MultiClassifier</strong></a></p>
<p><a href="http://rdp.cme.msu.edu/tutorials/cluster/RDPtutorial_CLUSTERING.html">Performing<strong> Complete Linkage Clustering</strong></a></p>
<p><a href="http://rdp.cme.msu.edu/tutorials/cluster/RDPtutorial_CLUST-RESULTS.html">&#8211;Using the <strong>.clust File Results (for abundance stats, diversity stats, OTU matrix or rarefaction)</strong></a></p>
<p><a href="http://rdp.cme.msu.edu/tutorials/stats/RDPtutorial_statistics.html">Performing<strong> statistical analysis (coming soon)</strong></a></p>
<p><a href="http://rdp.cme.msu.edu/tutorials/aligner/RDPtutorial_HMMER3-ALIGNER.html">Align protein using <strong>HMMER3 Aligner</strong></a></p>
<p><a href="http://rdp.cme.msu.edu/tutorials/framebot/RDPtutorial_FRAMEBOT.html">Frameshift-correction and closest match assignment by <strong>RDP FrameBot</strong></a></p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=634</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>454 pyrosequencing analysis pipeline</title>
		<link>https://www.biofacebook.com/?p=494</link>
		<comments>https://www.biofacebook.com/?p=494#comments</comments>
		<pubDate>Thu, 16 Aug 2012 08:27:16 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[My project]]></category>
		<category><![CDATA[metagenome]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=494</guid>
		<description><![CDATA[<p>mothur &#62; sffinfo(sff=454Reads_archaea.sff, flow=T) Extracting info from 454Reads_archaea.sff &#8230; 10000 20000 30000 40000 50000 60000 70000 80000 90000 92115 It took 68 secs to extract 92115. Output File Names: 454Reads_archaea.fasta 454Reads_archaea.qual 454Reads_archaea.flow</p> <p>mothur &#62; trim.flows(flow=454Reads_archaea.flow, oligos=oligos_LXY.txt, pdiffs=2, bdiffs=1, processors=2) Appending files from process 15674</p> <p>Output File Names: 454Reads_archaea.trim.flow 454Reads_archaea.scrap.flow 454Reads_archaea.GZ_ARC.flow 454Reads_archaea.GZ1122_ARC.flow 454Reads_archaea.GZ1122cellulose_ARC.flow 454Reads_archaea.GZ_xylan_ARC.flow 454Reads_archaea.GZ_cellulose55_ARC.flow 454Reads_archaea.SHX_xylan_ARC.flow [...]]]></description>
				<content:encoded><![CDATA[<p>mothur &gt; sffinfo(sff=454Reads_archaea.sff, flow=T)<br />
Extracting info from 454Reads_archaea.sff &#8230;<br />
10000<br />
20000<br />
30000<br />
40000<br />
50000<br />
60000<br />
70000<br />
80000<br />
90000<br />
92115<br />
It took 68 secs to extract 92115.<br />
Output File Names:<br />
454Reads_archaea.fasta<br />
454Reads_archaea.qual<br />
454Reads_archaea.flow</p>
<p>mothur &gt; trim.flows(flow=454Reads_archaea.flow, oligos=oligos_LXY.txt, pdiffs=2, bdiffs=1, processors=2)<br />
Appending files from process 15674</p>
<p>Output File Names:<br />
454Reads_archaea.trim.flow<br />
454Reads_archaea.scrap.flow<br />
454Reads_archaea.GZ_ARC.flow<br />
454Reads_archaea.GZ1122_ARC.flow<br />
454Reads_archaea.GZ1122cellulose_ARC.flow<br />
454Reads_archaea.GZ_xylan_ARC.flow<br />
454Reads_archaea.GZ_cellulose55_ARC.flow<br />
454Reads_archaea.SHX_xylan_ARC.flow<br />
454Reads_archaea.GZ_xylose_ARC.flow<br />
454Reads_archaea.Eric_ARC.flow<br />
454Reads_archaea.Milk_D_ARC.flow<br />
454Reads_archaea.Milk_E_ARC.flow<br />
454Reads_archaea.ST1219_ARC.flow<br />
454Reads_archaea.YL_ARC.flow<br />
454Reads_archaea.SHX_xylose_ARC.flow<br />
454Reads_archaea.SHX_cellulose55_ARC.flow<br />
454Reads_archaea.TP_1201_ARC.flow<br />
454Reads_archaea.ST_ARC.flow<br />
454Reads_archaea.YL0203cellulose_ARC.flow<br />
454Reads_archaea.TP_xylan_ARC.flow<br />
454Reads_archaea.ST0303cellulose_ARC.flow<br />
454Reads_archaea.SHX_ARC.flow<br />
454Reads_archaea.ST_xylan_ARC.flow<br />
454Reads_archaea.YL_xylan_ARC.flow<br />
454Reads_archaea.SHX1219_ARC.flow<br />
454Reads_archaea.SHX1125cellulose_ARC.flow<br />
454Reads_archaea.flow.files</p>
<p>mothur &gt; shhh.flows(file=454Reads_archaea.flow.files, processors=4)</p>
<p>mothur &gt; trim.seqs(fasta=454Reads_archaea.shhh.fasta, name=454Reads_archaea.shhh.names, oligos=oligos_LXY.txt, pdiffs=2, bdiffs=1, maxhomop=8, minlength=150, flip=T, processors=2)</p>
<p>Total of all groups is 44091</p>
<p>Output File Names:<br />
454Reads_archaea.shhh.trim.fasta<br />
454Reads_archaea.shhh.scrap.fasta<br />
454Reads_archaea.shhh.trim.names<br />
454Reads_archaea.shhh.scrap.names<br />
454Reads_archaea.shhh.groups</p>
<p>mothur &gt; summary.seqs(fasta=454Reads_archaea.shhh.trim.fasta, name=454Reads_archaea.shhh.trim.names)<br />
Using 2 processors.<br />
Start End NBases Ambigs Polymer NumSeqs<br />
Minimum: 1 218 218 0 3 1<br />
2.5%-tile: 1 251 251 0 3 1103<br />
25%-tile: 1 268 268 0 4 11023<br />
Median: 1 274 274 0 4 22046<br />
75%-tile: 1 281 281 0 4 33069<br />
97.5%-tile: 1 297 297 0 5 42989<br />
Maximum: 1 333 333 0 8 44091<br />
Mean: 1 273.837 273.837 0 4.15944<br />
# of unique seqs: 12780<br />
total # of seqs: 44091</p>
<p>Output File Name:<br />
454Reads_archaea.shhh.trim.summary</p>
<p>mothur &gt; unique.seqs(fasta=454Reads_archaea.shhh.trim.fasta, name=454Reads_archaea.shhh.trim.names)</p>
<p>1000 959<br />
2000 1691<br />
3000 2431<br />
4000 3358<br />
5000 4352<br />
6000 5335<br />
7000 6328<br />
8000 7261<br />
9000 8187<br />
10000 9082<br />
11000 9963<br />
12000 10859<br />
12780 11449</p>
<p>Output File Names:<br />
454Reads_archaea.shhh.trim.unique.fasta<br />
454Reads_archaea.shhh.trim.unique.names</p>
<p>mothur &gt; summary.seqs(fasta=454Reads_archaea.shhh.trim.unique.fasta, name=454Reads_archaea.shhh.trim.unique.names)<br />
Using 2 processors.</p>
<p>Start End NBases Ambigs Polymer NumSeqs<br />
Minimum: 1 218 218 0 3 1<br />
2.5%-tile: 1 251 251 0 3 1103<br />
25%-tile: 1 268 268 0 4 11023<br />
Median: 1 274 274 0 4 22046<br />
75%-tile: 1 281 281 0 4 33069<br />
97.5%-tile: 1 297 297 0 5 42989<br />
Maximum: 1 333 333 0 8 44091<br />
Mean: 1 273.837 273.837 0 4.15944<br />
# of unique seqs: 11449<br />
total # of seqs: 44091</p>
<p>Output File Name:<br />
454Reads_archaea.shhh.trim.unique.summary</p>
<p>Submit to RDP database, check and filter bacteria sequences!</p>
<p>http://rdp.cme.msu.edu/classifier/cl_status.jsp</p>
<p>domain Bacteria (1435 sequences)</p>
<p>shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/xiaoying_work_archaea/16s_archaea_allsamples/shhh_pipe2$ wc allrank_454Reads_archaea.shhh.trim.unique.fasta_classified.txt<br />
1435 1520 200784 allrank_454Reads_archaea.shhh.trim.unique.fasta_classified.txt</p>
<p>./filter_bacterseqs_for_align.py -i allrank_454Reads_archaea.shhh.trim.unique.fasta_classified.txt -f 454Reads_archaea.shhh.trim.unique.fasta -n 454Reads_archaea.shhh.trim.unique.names -g 454Reads_archaea.shhh.groups</p>
<p>shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/xiaoying_work_archaea/16s_archaea_allsamples/shhh_pipe2$ wc 454Reads_archaea.shhh.groups.filter<br />
42187 84374 1224728 454Reads_archaea.shhh.groups.filter<br />
shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/xiaoying_work_archaea/16s_archaea_allsamples/shhh_pipe2$ wc 454Reads_archaea.shhh.groups<br />
44091 88182 1270431 454Reads_archaea.shhh.groups</p>
<p>mothur &gt; summary.seqs(fasta=454Reads_archaea.shhh.trim.unique.fasta.filter, name=454Reads_archaea.shhh.trim.unique.names.filter)<br />
Using 2 processors.</p>
<p>Start End NBases Ambigs Polymer NumSeqs<br />
Minimum: 1 218 218 0 3 1<br />
2.5%-tile: 1 256 256 0 3 1055<br />
25%-tile: 1 268 268 0 4 10547<br />
Median: 1 274 274 0 4 21094<br />
75%-tile: 1 282 282 0 4 31641<br />
97.5%-tile: 1 297 297 0 5 41133<br />
Maximum: 1 333 333 0 8 42187<br />
Mean: 1 274.543 274.543 0 4.15355<br />
# of unique seqs: 10014<br />
total # of seqs: 42187<br />
mothur &gt; screen.seqs(fasta=454Reads_archaea.shhh.trim.unique.fasta.align, name=454Reads_archaea.shhh.trim.unique.names.filter, group=454Reads_archaea.shhh.groups.filter, processors=2)<br />
Output File Name:<br />
454Reads_archaea.shhh.trim.unique.fasta.summary<br />
###mothur &gt; align.seqs(candidate=454Reads_archaea.shhh.trim.unique.fasta.filter, template=core_set_aligned.imputed.fasta, flip=T, ksize=9, align=needleman, gapopen=-1, processors=3)<br />
###</p>
<p>mothur &gt; align.seqs(fasta=454Reads_archaea.shhh.trim.unique.fasta.filter, reference=core_set_aligned.fasta.imputed, flip=T, processors=3)<br />
Using 3 processors.</p>
<p>Reading in the core_set_aligned.fasta.imputed template sequences&#8230; DONE.<br />
It took 1 to read 4938 sequences.<br />
Aligning sequences from 454Reads_archaea.shhh.trim.unique.fasta.filter &#8230;<br />
100<br />
&#8230;<br />
3338<br />
Some of you sequences generated alignments that eliminated too many bases, a list is provided in 454Reads_archaea.shhh.trim.unique.fasta.flip.accnos. If the reverse compliment proved to be better it was reported.<br />
It took 60 secs to align 10014 sequences.<br />
Output File Names:<br />
454Reads_archaea.shhh.trim.unique.fasta.align<br />
454Reads_archaea.shhh.trim.unique.fasta.align.report<br />
454Reads_archaea.shhh.trim.unique.fasta.flip.accnos</p>
<p>mothur &gt; summary.seqs(fasta=454Reads_archaea.shhh.trim.unique.fasta.align, name=454Reads_archaea.shhh.trim.unique.names.filter)<br />
Using 3 processors.</p>
<p>Start End NBases Ambigs Polymer NumSeqs<br />
Minimum: 86 98 2 0 1 1<br />
2.5%-tile: 132 1746 51 0 3 1055<br />
25%-tile: 136 1822 268 0 4 10547<br />
Median: 136 1834 274 0 4 21094<br />
75%-tile: 136 1850 282 0 4 31641<br />
97.5%-tile: 194 1887 297 0 5 41133<br />
Maximum: 6858 6885 313 0 8 42187<br />
Mean: 284.168 1920.46 266.145 0 4.10781<br />
# of unique seqs: 10014<br />
total # of seqs: 42187</p>
<p>Output File Name:<br />
454Reads_archaea.shhh.trim.unique.fasta.summary<br />
##mothur &gt; screen.seqs(fasta=454Reads_archaea.shhh.trim.unique.fasta.align, name=454Reads_archaea.shhh.trim.unique.names.filter, group=454Reads_archaea.shhh.groups.filter, ##start=136, optimize=end, criteria=90, processors=2)<br />
#The optimize and criteria parameters allow you set the start, end, maxabig, maxhomop, minlength and maxlength parameters relative to your set of sequences .<br />
#For example optimize=start-end, criteria=90, would set the start and end values to the position 90% of your sequences started and ended.</p>
<p>mothur &gt; screen.seqs(fasta=454Reads_archaea.shhh.trim.unique.fasta.align, name=454Reads_archaea.shhh.trim.unique.names.filter, group=454Reads_archaea.shhh.groups.filter, optimize=start-end, criteria=90, processors=4)<br />
&#8230;<br />
Output File Names:<br />
454Reads_archaea.shhh.trim.unique.fasta.good.align<br />
454Reads_archaea.shhh.trim.unique.fasta.bad.accnos<br />
454Reads_archaea.shhh.trim.unique.names.good.filter<br />
454Reads_archaea.shhh.groups.good.filter<br />
It took 4 secs to screen 10014 sequences.</p>
<p>mothur &gt; summary.seqs(fasta=454Reads_archaea.shhh.trim.unique.fasta.good.align, name=454Reads_archaea.shhh.trim.unique.names.good.filter)</p>
<p>Using 4 processors.</p>
<p>Start End NBases Ambigs Polymer NumSeqs<br />
Minimum: 107 1819 243 0 3 1<br />
2.5%-tile: 133 1821 263 0 3 925<br />
25%-tile: 136 1831 269 0 4 9242<br />
Median: 136 1836 274 0 4 18484<br />
75%-tile: 136 1853 283 0 4 27725<br />
97.5%-tile: 136 1871 298 0 5 36042<br />
Maximum: 136 1920 313 0 8 36966<br />
Mean: 135.731 1840.24 276.401 0 4.14224<br />
# of unique seqs: 7703<br />
total # of seqs: 36966</p>
<p>Output File Name:<br />
454Reads_archaea.shhh.trim.unique.fasta.good.summary</p>
<p>&nbsp;</p>
<p>mothur &gt; filter.seqs(fasta=454Reads_archaea.shhh.trim.unique.fasta.good.align, vertical=T, trump=., processors=2)<br />
3700<br />
3800<br />
3851</p>
<p>Length of filtered alignment: 486<br />
Number of columns removed: 7196<br />
Length of the original alignment: 7682<br />
Number of sequences used to construct filter: 7703</p>
<p>Output File Names:<br />
454Reads_archaea.filter<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.fasta<br />
mothur &gt; unique.seqs(fasta=454Reads_archaea.shhh.trim.unique.fasta.good.filter.fasta, name=454Reads_archaea.shhh.trim.unique.names.good.filter)</p>
<p>1000 974<br />
2000 1887<br />
3000 2768<br />
4000 3604<br />
5000 4424<br />
6000 5238<br />
7000 6017<br />
7703 6573</p>
<p>Output File Names:<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.fasta<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.names</p>
<p>mothur &gt; shhh.seqs(fasta=454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.fasta, name=454Reads_archaea.shhh.trim.unique.fasta.good.filter.names, group=454Reads_archaea.shhh.groups.good.filter, processors=3)<br />
Output File Names:<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.unique.fasta<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.unique.names</p>
<p>/******************************************/</p>
<p>Output File Names:<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh.Eric_ARC.map<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh.GZ1122_ARC.map<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh.GZ1122cellulose_ARC.map<br />
&#8230;&#8230;.</p>
<p>mothur &gt; summary.seqs(fasta=454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.fasta, name=454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.names)<br />
Using 2 processors.</p>
<p>Start End NBases Ambigs Polymer NumSeqs<br />
Minimum: 1 484 219 0 3 1<br />
2.5%-tile: 1 486 260 0 3 925<br />
25%-tile: 1 486 261 0 4 9242<br />
Median: 1 486 261 0 4 18484<br />
75%-tile: 1 486 266 0 4 27725<br />
97.5%-tile: 1 486 266 0 5 36042<br />
Maximum: 3 486 282 0 7 36966<br />
Mean: 1.00103 486 262.434 0 4.1387<br />
# of unique seqs: 2911<br />
total # of seqs: 36966</p>
<p>Output File Name:<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.summary</p>
<p>&nbsp;</p>
<p>mothur &gt; chimera.uchime(fasta=454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.fasta, name=454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.names, group=454Reads_archaea.shhh.groups.good.filter, processors=3)<br />
It took 0 secs to check 46 sequences from group YL_xylan_ARC.</p>
<p>It took 43 secs to check 3276 sequences. 362 chimeras were found.<br />
The number of sequences checked may be larger than the number of unique sequences because some sequences are found in several samples.</p>
<p>Output File Names:<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.chimeras<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos<br />
##############################3<br />
shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/xiaoying_work_archaea/16s_archaea_allsamples/shhh_pipe2$ mv 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.self<br />
shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/xiaoying_work_archaea/16s_archaea_allsamples/shhh_pipe2$ mv 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.chimeras 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.chimeras.self<br />
################################</p>
<p>mothur &gt;chimera.uchime(fasta=454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.fasta, reference=core_set_aligned.fasta.imputed, processors=3)<br />
05:04 26Mb 100.0% 30/969 chimeras found (3.1%)<br />
05:11 26Mb 100.0% 88/970 chimeras found (9.1%)</p>
<p>It took 311 secs to check 2911 sequences. 213 chimeras were found.</p>
<p>Output File Names:<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.chimeras<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos</p>
<p>###################################<br />
shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/xiaoying_work_archaea/16s_archaea_allsamples/shhh_pipe2$ wc 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos<br />
213 213 3195 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos<br />
shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/xiaoying_work_archaea/16s_archaea_allsamples/shhh_pipe2$ wc 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.self<br />
362 362 5430 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.self<br />
cat 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.self &gt; 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.sum</p>
<p>sort 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.sum &gt; 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.sum.sort</p>
<p>merge 2 predict results of chimera and del repeat!<br />
shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/xiaoying_work_archaea/16s_archaea_allsamples/shhh_pipe2$ sed &#8216;$!N; /^\(.*\)\n\1$/!P; D&#8217; 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.sum.sort &gt; 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.sum.sort.uniq<br />
shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/xiaoying_work_archaea/16s_archaea_allsamples/shhh_pipe2$ wc 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.sum.sort.uniq<br />
382 382 5730 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.sum.sort.uniq<br />
####################################</p>
<p>get_fasta_from_seqname.py -i 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.sum.sort.uniq -j 454Reads_archaea.fasta &gt; 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.sum.sort.uniq.fasta</p>
<p>chimera seqs RDP checking (http://rdp.cme.msu.edu/classifier/classifier.jsp)<br />
Check the last genus id percent, if percent &gt;=90%, (keep it and merge it to the non-chimera reads of each sample)<br />
shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/xiaoying_work_archaea/16s_archaea_allsamples/shhh_pipe2$ more allrank_454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.sum.sort.uniq.fasta_classified.txt<br />
HQ93PQ301A0ZHW;;Root;100%;Archaea;100%;&#8221;Euryarchaeota&#8221;;100%;&#8221;Methanomicrobia&#8221;;100%;Methanomicrobiales;100%;Methanospirillaceae;100%;Methanospirillum;100%<br />
HQ93PQ301A1JYN;;Root;100%;Archaea;100%;&#8221;Euryarchaeota&#8221;;100%;&#8221;Methanomicrobia&#8221;;100%;Methanosarcinales;100%;Methanosarcinaceae;100%;Methanosarcina;100%<br />
shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/xiaoying_work_archaea/16s_archaea_allsamples/shhh_pipe2$ check_real_chimera_seq.py -i allrank_454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.uchime.accnos.sum.sort.uniq.fasta_classified.txt -d 90 | wc<br />
221 221 3315</p>
<p>The 221 sequences should be merged to non-chimera results!!</p>
<p>-rwxrwxrwx 1 root root 2.4K 2012-08-14 11:19 chimera.seqs.name<br />
shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/xiaoying_work_archaea/16s_archaea_allsamples/shhh_pipe2$ wc chimera.seqs.name<br />
161 161 2415 chimera.seqs.name</p>
<p>######################################################################<br />
Removing chimeras (the total predict chimera seqs by two approaches!)<br />
######################################################################<br />
mothur &gt; remove.seqs(accnos=chimera.seqs.name, fasta=454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.fasta, name=454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.names, group=454Reads_archaea.shhh.groups.good.filter)</p>
<p>Removed 1197 sequences from your name file.<br />
Removed 161 sequences from your fasta file.<br />
Removed 1197 sequences from your group file.</p>
<p>Output File Names:<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.names<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.fasta<br />
454Reads_archaea.shhh.groups.good.pick.filter<br />
mothur &gt; summary.seqs(name=current)</p>
<p>Using 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.names as input file for the name parameter.<br />
Using 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.fasta as input file for the fasta parameter.</p>
<p>Using 3 processors.</p>
<p>Start End NBases Ambigs Polymer NumSeqs<br />
Minimum: 1 484 219 0 3 1<br />
2.5%-tile: 1 486 260 0 3 895<br />
25%-tile: 1 486 261 0 4 8943<br />
Median: 1 486 261 0 4 17885<br />
75%-tile: 1 486 266 0 4 26827<br />
97.5%-tile: 1 486 266 0 5 34875<br />
Maximum: 3 486 278 0 7 35769<br />
Mean: 1.00106 486 262.504 0 4.16914<br />
# of unique seqs: 2750<br />
total # of seqs: 35769</p>
<p>Output File Name:<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.summary<br />
#############################<br />
chimera number<br />
##############################<br />
./compute_chimera_for_singlesample.py -i 454Reads_Bacteria.Eric_BAC.shhh.groups -j chimera.seqs.name</p>
<p>GZ1122_ARC: 10<br />
GZ1122cellulose_ARC: 3<br />
GZ_ARC: 1<br />
GZ_cellulose55_ARC: 1<br />
GZ_xylan_ARC: 7<br />
GZ_xylose_ARC: 4<br />
SHX1125cellulose_ARC: 0<br />
SHX1219_ARC: 0<br />
SHX_ARC: 5<br />
SHX_cellulose55_ARC: 8<br />
SHX_xylan_ARC: 3<br />
SHX_xylose_ARC: 35<br />
ST0303cellulose_ARC: 15<br />
ST1219_ARC: 5<br />
ST_ARC: 21<br />
ST_xylan_ARC: 11<br />
TP_1201_ARC: 0<br />
TP_xylan_ARC: 19<br />
YL0203cellulose_ARC: 7<br />
YL_ARC: 2<br />
YL_xylan_ARC: 3</p>
<p>&nbsp;</p>
<p>#######################<br />
Removing &#8220;contaminants&#8221;<br />
#######################<br />
wget http://www.mothur.org/w/images/5/59/Trainset9_032012.pds.zip<br />
shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/xiaoying_work_archaea/16s_archaea_allsamples/shhh_pipe2$ unzip Trainset9_032012.pds.zip<br />
Archive: Trainset9_032012.pds.zip<br />
inflating: trainset9_032012.pds.tax<br />
inflating: trainset9_032012.pds.fasta</p>
<p>mothur &gt; classify.seqs(fasta=454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.fasta, name=454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.names, group=454Reads_archaea.shhh.groups.good.pick.filter, template=trainset9_032012.pds.fasta, taxonomy=trainset9_032012.pds.tax, cutoff=80, processors=2)<br />
&#8230;.<br />
Processing sequence: 1300<br />
Processing sequence: 1300<br />
[WARNING]: HQ93PQ301C4CTV could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.<br />
[WARNING]: HQ93PQ301CK6YP could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.<br />
[WARNING]: HQ93PQ301DJMTI could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.<br />
[WARNING]: HQ93PQ301ERRC6 could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences.<br />
Processing sequence: 1372<br />
Processing sequence: 1371<br />
It took 25 secs to classify 2750 sequences.</p>
<p>Reading 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.names&#8230; Done.</p>
<p>It took 3 secs to create the summary file for 2750 sequences.<br />
Output File Names:<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.pds.taxonomy<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.pds.flip.accnos<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.pds.tax.summary</p>
<p>&nbsp;</p>
<p>mothur &gt; remove.lineage(fasta=454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.fasta, name=454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.names, group=454Reads_archaea.shhh.groups.good.pick.filter, taxonomy=454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.pds.taxonomy, taxon=Mitochondria-Cyanobacteria_Chloroplast-Eukarya-Bacteria-unknown)<br />
Output File Names:<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.pds.pick.taxonomy<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.pick.names<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.pick.fasta<br />
454Reads_archaea.shhh.groups.good.pick.pick.filter<br />
mothur &gt; summary.seqs(name=current)</p>
<p>Using 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.pick.names as input file for the name parameter.<br />
Using 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.pick.fasta as input file for the fasta parameter.</p>
<p>Using 2 processors.</p>
<p>Start End NBases Ambigs Polymer NumSeqs<br />
Minimum: 1 484 219 0 3 1<br />
2.5%-tile: 1 486 260 0 3 852<br />
25%-tile: 1 486 261 0 4 8519<br />
Median: 1 486 261 0 4 17037<br />
75%-tile: 1 486 266 0 4 25555<br />
97.5%-tile: 1 486 266 0 5 33221<br />
Maximum: 1 486 278 0 7 34072<br />
Mean: 1 486 262.598 0 4.21437<br />
# of unique seqs: 2644<br />
total # of seqs: 34072</p>
<p>Output File Name:<br />
454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.pick.summary</p>
<p>############################################################################################<br />
mothur &gt; system(cp 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.pds.pick.taxonomy archaea_16s_final.taxonomy)<br />
mothur &gt; system(cp 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.pick.names archaea_16s_final.names)<br />
mothur &gt; system(cp 454Reads_archaea.shhh.trim.unique.fasta.good.filter.unique.shhh_seqs.pick.pick.fasta archaea_16s_final.fasta)<br />
mothur &gt; system(cp 454Reads_archaea.shhh.groups.good.pick.pick.filter archaea_16s_final.groups)<br />
mothur &gt; dist.seqs(fasta=archaea_16s_final.fasta, cutoff=0.1, processors=3)</p>
<p>Output File Name:<br />
archaea_16s_final.dist</p>
<p>It took 25 to calculate the distances for 2644 sequences.</p>
<p>mothur &gt; cluster(column=archaea_16s_final.dist, name=archaea_16s_final.names)<br />
changed cutoff to 0.0392497</p>
<p>Output File Names:<br />
archaea_16s_final.an.sabund<br />
archaea_16s_final.an.rabund<br />
archaea_16s_final.an.list</p>
<p>It took 9 seconds to cluster</p>
<p>mothur &gt; make.shared(list=archaea_16s_final.an.list, group=archaea_16s_final.groups)</p>
<p>unique<br />
0.01<br />
0.02<br />
0.03</p>
<p>Output File Names:<br />
archaea_16s_final.an.shared<br />
archaea_16s_final.an.Eric_ARC.rabund<br />
archaea_16s_final.an.GZ1122_ARC.rabund<br />
archaea_16s_final.an.GZ1122cellulose_ARC.rabund<br />
archaea_16s_final.an.GZ_ARC.rabund<br />
archaea_16s_final.an.GZ_cellulose55_ARC.rabund<br />
archaea_16s_final.an.GZ_xylan_ARC.rabund<br />
archaea_16s_final.an.GZ_xylose_ARC.rabund<br />
archaea_16s_final.an.Milk_D_ARC.rabund<br />
archaea_16s_final.an.Milk_E_ARC.rabund<br />
archaea_16s_final.an.SHX1125cellulose_ARC.rabund<br />
archaea_16s_final.an.SHX1219_ARC.rabund<br />
archaea_16s_final.an.SHX_ARC.rabund<br />
archaea_16s_final.an.SHX_cellulose55_ARC.rabund<br />
archaea_16s_final.an.SHX_xylan_ARC.rabund<br />
archaea_16s_final.an.SHX_xylose_ARC.rabund<br />
archaea_16s_final.an.ST0303cellulose_ARC.rabund<br />
archaea_16s_final.an.ST1219_ARC.rabund<br />
archaea_16s_final.an.ST_ARC.rabund<br />
archaea_16s_final.an.ST_xylan_ARC.rabund<br />
archaea_16s_final.an.TP_1201_ARC.rabund<br />
archaea_16s_final.an.TP_xylan_ARC.rabund<br />
archaea_16s_final.an.YL0203cellulose_ARC.rabund<br />
archaea_16s_final.an.YL_ARC.rabund<br />
archaea_16s_final.an.YL_xylan_ARC.rabund<br />
mothur &gt; count.groups()</p>
<p>Using archaea_16s_final.an.shared as input file for the shared parameter.<br />
Eric_ARC contains 14.<br />
GZ1122_ARC contains 1780.<br />
GZ1122cellulose_ARC contains 1063.<br />
GZ_ARC contains 53.<br />
GZ_cellulose55_ARC contains 1997.<br />
GZ_xylan_ARC contains 1509.<br />
GZ_xylose_ARC contains 1241.<br />
Milk_D_ARC contains 19.<br />
Milk_E_ARC contains 434.<br />
SHX1125cellulose_ARC contains 2568.<br />
SHX1219_ARC contains 2012.<br />
SHX_ARC contains 1594.<br />
SHX_cellulose55_ARC contains 2235.<br />
SHX_xylan_ARC contains 944.<br />
SHX_xylose_ARC contains 1932.<br />
ST0303cellulose_ARC contains 1815.<br />
ST1219_ARC contains 1597.<br />
ST_ARC contains 774.<br />
ST_xylan_ARC contains 1755.<br />
TP_1201_ARC contains 1952.<br />
TP_xylan_ARC contains 1849.<br />
YL0203cellulose_ARC contains 1762.<br />
YL_ARC contains 1154.<br />
YL_xylan_ARC contains 2019.<br />
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^old^^^^^^^^^^^^^<br />
Using archaea_16s_final.an.shared as input file for the shared parameter.<br />
Eric_ARC contains 15.***********<br />
GZ1122_ARC contains 1815.<br />
GZ1122cellulose_ARC contains 1081.<br />
GZ_ARC contains 54.<br />
GZ_cellulose55_ARC contains 1997.<br />
GZ_xylan_ARC contains 1547.<br />
GZ_xylose_ARC contains 1245.<br />
Milk_D_ARC contains 18. *********<br />
Milk_E_ARC contains 434. ********<br />
SHX1125cellulose_ARC contains 2570.<br />
SHX1219_ARC contains 2012.<br />
SHX_ARC contains 1593.<br />
SHX_cellulose55_ARC contains 2236.<br />
SHX_xylan_ARC contains 947.<br />
SHX_xylose_ARC contains 1932.<br />
ST0303cellulose_ARC contains 1810.<br />
ST1219_ARC contains 1597.<br />
ST_ARC contains 759.<br />
ST_xylan_ARC contains 1755.<br />
TP_1201_ARC contains 1952.<br />
TP_xylan_ARC contains 1849.<br />
YL0203cellulose_ARC contains 1762.<br />
YL_ARC contains 1164.<br />
YL_xylan_ARC contains 2019.</p>
<p>&nbsp;</p>
<p>mothur &gt; count.groups()</p>
<p>Using archaea_16s_final.an.shared as input file for the shared parameter.<br />
Eric_ARC contains 569.<br />
GZ1122_ARC contains 2103.<br />
GZ1122cellulose_ARC contains 1594.<br />
GZ_ARC contains 530.<br />
GZ_cellulose55_ARC contains 2001.<br />
GZ_xylan_ARC contains 1889.<br />
GZ_xylose_ARC contains 2015.<br />
Milk_D_ARC contains 598.<br />
Milk_E_ARC contains 1753.<br />
SHX1125cellulose_ARC contains 2831.<br />
SHX1219_ARC contains 2247.<br />
SHX_ARC contains 1660.<br />
SHX_cellulose55_ARC contains 2249.<br />
SHX_xylan_ARC contains 1213.<br />
SHX_xylose_ARC contains 1991.<br />
ST0303cellulose_ARC contains 1845.<br />
ST1219_ARC contains 1621.<br />
ST_ARC contains 1859.<br />
ST_xylan_ARC contains 1769.<br />
TP_1201_ARC contains 1969.<br />
TP_xylan_ARC contains 1890.<br />
YL0203cellulose_ARC contains 1785.<br />
YL_ARC contains 1285.<br />
YL_xylan_ARC contains 2025.</p>
<p>&nbsp;<br />
mothur &gt; sub.sample(shared=archaea_16s_final.an.shared, size=759)</p>
<p>Eric_ARC contains 15. Eliminating.<br />
GZ_ARC contains 54. Eliminating.<br />
Milk_D_ARC contains 18. Eliminating.<br />
Milk_E_ARC contains 434. Eliminating.<br />
Sampling 759 from each group.<br />
unique<br />
0.01<br />
0.02<br />
0.03</p>
<p>Output File Names:<br />
archaea_16s_final.an.uniquesubsample.shared<br />
archaea_16s_final.an.0.01subsample.shared<br />
archaea_16s_final.an.0.02subsample.shared<br />
archaea_16s_final.an.0.03subsample.shared<br />
mothur &gt; classify.otu(list=archaea_16s_final.an.list, name=archaea_16s_final.names, taxonomy=archaea_16s_final.taxonomy)</p>
<p>reftaxonomy is not required, but if given will keep the rankIDs in the summary file static.<br />
unique 2636<br />
0.01 1940<br />
0.02 1033<br />
0.03 634</p>
<p>Output File Names:<br />
archaea_16s_final.an.uniquecons.taxonomy<br />
archaea_16s_final.an.uniquecons.tax.summary<br />
archaea_16s_final.an.0.01cons.taxonomy<br />
archaea_16s_final.an.0.01cons.tax.summary<br />
archaea_16s_final.an.0.02cons.taxonomy<br />
archaea_16s_final.an.0.02cons.tax.summary<br />
archaea_16s_final.an.0.03cons.taxonomy<br />
archaea_16s_final.an.0.03cons.tax.summary</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=494</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MetaPhlAn: Metagenomic Phylogenetic Analysis</title>
		<link>https://www.biofacebook.com/?p=474</link>
		<comments>https://www.biofacebook.com/?p=474#comments</comments>
		<pubDate>Wed, 08 Aug 2012 03:58:38 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[二代测序]]></category>
		<category><![CDATA[生物信息]]></category>
		<category><![CDATA[metagenome]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=474</guid>
		<description><![CDATA[<p>MetaPhlAn is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn relies on unique clade-specific marker genes identified from 3,000 reference genomes, allowing:</p> up to 25,000 reads-per-second (on one CPU) analysis speed (orders of magnitude faster compared to existing methods); unambiguous taxonomic assignments as the MetaPhlAn markers are [...]]]></description>
				<content:encoded><![CDATA[<p>MetaPhlAn is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn relies on unique clade-specific marker genes identified from 3,000 reference genomes, allowing:</p>
<ul>
<li>up to 25,000 reads-per-second (on one CPU) analysis speed (orders of magnitude faster compared to existing methods);</li>
<li>unambiguous taxonomic assignments as the MetaPhlAn markers are clade-specific;</li>
<li>accurate estimation of organismal relative abundance (in terms of number of cells rather than fraction of reads);</li>
<li>species-level resolution for bacterial and archaeal organisms;</li>
<li>extensive validation of the profiling accuracy on several synthetic datasets and on thousands of real metagenomes.</li>
</ul>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2012/08/hmptree13_nl_bb.png"><img class="alignleft size-full wp-image-477" title="hmptree13_nl_bb" src="http://www.biofacebook.com/wp-content/uploads/2012/08/hmptree13_nl_bb.png" alt="" width="900" height="765" /></a></p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=474</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DySC: software for greedy clustering of 16S rRNA reads</title>
		<link>https://www.biofacebook.com/?p=471</link>
		<comments>https://www.biofacebook.com/?p=471#comments</comments>
		<pubDate>Wed, 08 Aug 2012 02:56:15 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[二代测序]]></category>
		<category><![CDATA[metagenome]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=471</guid>
		<description><![CDATA[<p id="p-2">Summary: Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the [...]]]></description>
				<content:encoded><![CDATA[<p id="p-2"><strong>Summary:</strong> Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime.</p>
<p id="p-3">Availability and implementation: DySC, implemented in C, is available at <a href="http://code.google.com/p/dysc/">http://code.google.com/p/dysc/</a> under GNU GPL license.</p>
<p id="p-4"><strong>Contact:</strong> <a href="mailto:bertil.schmidt@uni-mainz.de">bertil.schmidt@uni-mainz.de</a></p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=471</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
