January « 2013 « 小生这厢有礼了(BioFaceBook Personal Blog)

Install genometools

the ‘new’ error message refers to a nonexistant Cairo library on your system, which is needed for the AnnotationSketch component of GenomeTools. If you do not need this, do a ‘make cleanup’ and recompile with the additional make option ‘cairo=no’, e.g. ‘make errorcheck=no cairo=no’. This will disable support for AnnotationSketch and remove the cairo […]

R 列表随意组合 data.frame(x,y)

> protein_data_B30_min<-protein_data[1:2548,10:12] > protein_data_M30_min<-protein_data[1:2548,19:21] > protein_data_30_min<-data.frame(protein_data_B30_min,protein_data_M30_min) > protein_data_30_min[1:2,] B30 B30.1 B30.2 Mgo30 Mgo30.1 Mgo30.2 1 870.5042 867.0873 0 1086.828 1481.228 2726.929 2 5167.6455 4646.3450 0 4409.320 3017.866 3216.642 […]

DSK: k-mer counting with very low memory usage

Summary: Counting all the k-mers (substrings of length k) in DNA/RNA sequencing reads is the preliminary step of many bioinformatics applications. However, state of the art k-mer counting methods require that a large data structure resides in memory. Such structure typically grows with the number of distinct k-mers to count.

We present a […]

Free VPN

http://www.afreevpn.com/

www.bestfreevpn.com

Circos for comparative genomes

Circos: nucmer –prefix=refBAV1_qryVScds Dehalococcoides_BAV1.fasta Dehalococcoides_VS.cds.fasta show-tiling -i 80 -c refBAV1_qryVScds.delta > refBAV1_qryVScds.tiling awk -F ” ” ‘{print “chr1″ “\t” $1 “\t” $2}’ refBAV1_qryVScds.tiling > Dehalococcoidessp.VScds.gene_tableBAV1.txt

shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/circos/circos-tutorials-0.62/tutorials/8/1$ perl ../../../../circos-0.62-1/bin/circos -conf circos.conf

shenzy@shenzy-ubuntu:/winxp_disk2/shenzy/circos/circos-tutorials-0.62/tutorials/8/1$ ll total 16001 -rwxrwxrwx 1 root root 528519 2012-12-27 16:11 circos1.png -rwxrwxrwx 1 root root 1065925 2012-12-27 14:33 circos2.png -rwxrwxrwx 1 root root 1230394 […]

How to measure codon usage bias

Codon adaptation index (CAI) is one of them. To examine the CAI value of a gene, a reference table of RSCU (relative synonymous codon usage) values for highly expressed genes is compiled.

A software call CodonW, you can download it from: http://codonw.sourceforge.net/. There is also a PhD thesis associated to it.

shenzy@shenzy-ubuntu:~/Downloads/CondonW/codonW$ codonw input.dat -all_indices […]

利用tophat和Cufflinks做转录组差异表达分析的步骤详解

今天一个同学给我推荐一篇Nature Protocol 上文章，关于转录组差异表达分析。尚在正式通读之前习惯性浏览一遍图表，说实在这篇文章着实让我觉得有点“另类”。这是一篇活生生的利用Bowtie、tophat和Cufflinks做转录组差异表达分析的protocol。里面详细讲解每一步需要分析什么，用哪些些软件，已经相关命令和参数。

根据文章介绍的workflow，做转录组分析，无论是链特异性转录组数据（Strand-specific RNA-seq）还是非特异性数据，主要内容包括下面几个部分：

1）reads mapping，这里面推荐两款软件一个是Bowtie，另一个是tophat（此软件相对于Bowtie或者bwa，可以识别转录本的可变剪接）

2）转录组本组装（利用Cufflinks），转录本与已有基因组注释比较（利用Cuffcompare）、合并（利用Cuffmerge），转录组本差异表达分析（利用Cuffdiff）。

下面附上原文中的两张图片供大家快速预览转录组分析大致过程，其中图1是转录组分析中可能会用到的软件以及相关功能，图2：是转录本分析的一般流程。

图1

图2

关于转录组分析的相关软件在分析数据过程中的命令和参数，这里就不附加上来了，请大家直接阅读原文。

Cole Trapnell, Adam Roberts, Loyal Goff, Geo Pertea, Daehwan Kim, David R Kelley, Harold Pimentel, Steven L Salzberg, John L Rinn & Lior Pachter. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nature Protocols 7, 562–578 […]

Solexa与Hiseq测序技术中常见术语名词解释

第二代测序技术中Solexa以及它的升级版Hiseq，目前使用最多。为了帮助PLoB网友进一步了解Solexa相关的概念。与大家分享一篇网上看到的文章《Solexa测序技术中常见术语解释》，文章后面有参考来源链接。更多相关信息欢迎加入PLoB 2000人的生物信息QQ群（群号：235461986）来讨论，有相关测序以及生物信息学问题需要解答欢迎前来。下面直接附上相关的解释。大家同时可以结合上面的示意图，了解Solexa与Hiseq的基本结构。

SBS：边合成边测序反应，每次SBS会延伸一个碱基，大约耗时70分钟。

Run：单次上机测序反应，可以产生4G-75G测序通量不等。

Lane：单泳道，每条泳道可以直接物理区分测序样品，1次run最多可以同时上样8条Lane。

Channel：Lane的同义词。

Tile：小区，每条Lane中排有2列tile，合计120个小区。每个小区上分布数目繁多的簇结合位点。

Cluster：簇，在Solexa测序技术中会采用桥式PCR方式生产DNA簇，每个DNA簇才能产生亮度达到CCD可以分辨的荧光点。

Index：标签，在Solexa多重测序（Multiplexed Sequencing）过程中会使用Index来区分样品，并在常规测序完成后，针对Index部分额外进行7个循环的测序，通过Index的识别，可以在1条Lane中区分12种不同的样品。

Barcode: Index同义词

Fasta：一种序列存储格式。一个序列文件若以FASTA格式存储，则每一条序列的第一行以“>”开头，而跟随“>”的是序列的ID号（即唯一的标识符）及对该序列的描述信息；第二行开始是序列内容，序列短于61nt的，则一行排列完；序列长于 61nt的，则每行存储61nt，最后剩下小于61nt的，在最后一行排列完；第二条序列另起一行，仍然由“>”和序列的ID号开始，以此类推。

Fastq：Fastq是Solexa测序技术中一种反映测序序列的碱基质量的文件格式。第一行以“@”符号开头，后面紧跟一个序列的描述信息；第二行是该序列的内容；第三行以“+”符号开头，后面紧跟的内容与第一行一样，同样是该序列的描述信息；而第四行是第二行中的序列内容每个碱基所对应的测序质量值。

PF%：PF%是指符合测序质量标准的簇的百分比（Multiplexed Sequencing），与测序的通量相关联。

Read：Solexa是成簇反应的，每个簇对应一条DNA序列片段，成为一个read。

名词解释与图片的参考来源：http://www.igenomics.com.cn:7001/ajgene/jsp/ajweb/News.jsp?cid=C47825F27EC00001B8BF8B8D11C01D10

[…]

R画维恩图

> install.packages(‘plotrix’) Installing package(s) into ‘/home/shenzy/R/x86_64-pc-linux-gnu-library/2.15’ (as ‘lib’ is unspecified) 试开URL’http://cran.csiro.au/src/contrib/plotrix_3.4-5.tar.gz’ Content type ‘application/x-gzip’ length 211113 bytes (206 Kb) 打开了URL ================================================== downloaded 206 Kb * installing *source* package ‘plotrix’ … ** 成功将‘plotrix’程序包解包并MD5和检查 ** R ** data ** demo ** inst ** preparing package for lazy loading ** help *** installing help indices ** building […]

小生这厢有礼了(BioFaceBook Personal Blog)

分类

Recent Comments

链接表