NGS « 小生这厢有礼了(BioFaceBook Personal Blog)

Microbial Community Analysis GUI–Bioconducter

http://www.bioconductor.org/packages/release/bioc/html/mcaGUI.html

mcaGUI Microbial Community Analysis GUI

Bioconductor version: Release (2.10)

Microbial community analysis GUI for R using gWidgets.

Author: Wade K. Copeland, Vandhana Krishnan, Daniel Beck, Matt Settles, James Foster, Kyu-Chul Cho, Mitch Day, Roxana Hickey, Ursel M.E. Schutte, Xia Zhou, Chris Williams, Larry J. Forney, Zaid Abdo, Poor Man’s GUI (PMG) base code by […]

RSeQC: quality control of RNA-seq experiments

Abstract

Motivation: RNA-seq has been extensively used for transcriptome study. Quality control (QC) is critical to ensure that RNA-seq data are of high quality and suitable for subsequent analyses. However, QC is a time-consuming and complex task, due to the massive size and versatile nature of RNA-seq data. Therefore, a convenient and comprehensive QC […]

Bioinformatics for personal genome interpretation

http://bib.oxfordjournals.org/content/13/4/495.full

Key Points

Vast amounts of variation data from genome sequencing studies need to be analyzed to understand its association with various phenotypes.

Well-curated databases, reliable tools for gene prioritization and accurate methods for predicting the impact of variants will be essential for the interpretation of personal genomes.

Standard and unified protocols […]

Tutorial: Piping with samtools, bwa and bedtools

In this tutorial I hope to introduce some of the concepts for using unix piping. Piping is a very useful feature to avoid creation of intermediate use once files.

Lets begin with a typical command to do paired end mapping with bwa:

#-t 4 is for using 4 threads/cores bwa aln -t 4 ./hg19.fasta ./s1_1.fastq […]

blast2go 本地化数据库安装运行以及简单在线调用

最近需要对预测到的基因进行个注释工作，着手进行blast2go的工作：

最简单方式:运用官网的免费在线调用数据库方式，（需要安装好JAVA Java Runtime Environment (JRE) from http://www.java.com/download）

步骤如下：

（1）进入官网http://www.blast2go.com/b2glaunch/start-blast2go

选择相应大小的内存，点击here，如未能直接在线运行，则会让你保存并下载blast2go.jnlp 文件。

（2）然后直接在命令行运行 javaws blast2go.jnlp 回车即可出现界面，剩下的就是简单点击界面和运行了！

##############################################

本地化数据库命令行运行方式：

B2G4PIPE – Blast2GO without graphical interface

1.从http://www.blast2go.com/b2glaunch/resources

下载相应资源

http://www.blast2go.com/data/blast2go/b2g4pipe_v2.5.zip

http://www.blast2go.com/data/blast2go/local_b2g_db_tutorial_0809.zip

下载b2g database所需文件:

http://archive.geneontology.org/latest-full/go-assocdb-data.gz

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz

ftp://ftp.pir.georgetown.edu/databases/idmapping/idmapping.tb.gz

(可选,依据mysql版本)

替换b2g_db.sql 里的TYPE=MyISAM 为 ENGINE=MyISAM

同上替换go_201110-assocdb-data里:

sed -i ‘s/TYPE=MyISAM/ENGINE=MyISAM, DEFAULT CHARACTER SET latin1/’ go_201110-assocdb-data

2. 编辑后运行tutorial 里download_and_install.sh 或像下面这样手工运行:

3. 编辑并运行b2g_db.sql:

[…]

高通量测序与云计算

高通量测序（下一代测序）最大的特点就是产生海量的数据，454测序运行一次可以产生400M左右的数据，Illumina HiSeq运行一次产生的数据量高达200G！这么多数据出来以后，必然需要大量的计算，而随着高通量测序在各个领域的广泛应用，个人计算机和工作站显然将无法完成这种数据处理工作。一些大公司或高校可以用他们自己的超级计算机进行计算，如华大拥有数个大型生物信息学超级计算中心，港大有HPC。那一些小的公司和科研单位怎么办呢？

云计算是个非常合适的选择。云计算（Cloud computing）是一种基于互联网的计算方式，通过这种方式，共享的软硬件资源和信息可以按需提供给计算机和其他设备。整个运行方式很像电网（摘自维基百科）。简单地说就是可以通过互联网，把数据放到“云”中进行计算。目前Google、亚马逊(Amazon)和微软都在开发并提供云计算服务，比较适合进行高通量测序数据处理的应该是亚马逊的AWS。

今天简单了解了一下亚马逊提供的云计算，觉得挺不错的，灵活且价格便宜：

(1) 进行计算的时候才收费，不用的时候不收费; (2) 操作系统可以自由选择Windows和Linux，而港大的HPC只有Linux可用…… (3) 价格非常便宜，以EC2为例，标准情况下，1个Instance（大致相当于一台普通电脑的计算能力吧）使用1小时只要0.085美元。这样，租20台电脑运行1天(24小时)，才40美元多一点，大致相当于260RMB，简直是太便宜了。

事实上，已经有很多人在用云计算在进行高通量测序数据处理了。请看：这里。

一个生物领域的新技术，一个计算机领域的新技术，这么一碰，火花就产生了。有点可惜的是，在这两个领域，中国都没有掌握核心技术，远远落后，需要加油！

转载自：有个博客 [ http://www.yelinsky.com/blog/ ]

本文链接地址：http://www.yelinsky.com/blog/archives/349.html

在亚马逊EC2上部署Apache和Django

EC2是亚马逊（Amazon.com）提供的弹性云计算服务； Apache是一个跨平台的Web服务器端软件，可以使Python、PHP、Perl等语言编写的程序运行在服务器上； Django是一个Web程序框架，应用这个框架，可以使Python Web程序的编写变得更加简单； Amazon S3是亚马逊提供的云存储服务； Amazon EC2与Amazon S3结合, 几乎可以提供无限的存储空间和无限的计算能力。

以上这些东西综合在一起，就可以用简单易用的Python做出一个提供海量数据处理功能的网站，感觉这玩意儿应该在高通量测序数据数据处理方面有点用。

下面是在亚马逊EC2上部署Apache和Django的步骤：

0. 首先需要AWS上在建立一个EC2 Instance，使用Ubuntu Linux系统，可以直接在Community AMI中直接选择Ubuntu官方的AMI，ID为ami-cef405a7，EC2 Instance的建立过程并不复杂，这里就不细说了。注意：建好之后用SSH登录的时候，用户名是ubuntu，不是ec2-user，也不是 root.

1. 安装apache sudo apt-get install apache2

2. 下载安装Django wget http://www.djangoproject.com/download/1.3/tarball/ 下载下来的文件名是index.html，改一下名 mv index.html Django-1.3.tar.gz 解压 tar xzvf Django-1.3.tar.gz 安装 cd Django-0.91 sudo python setup.py install

3. 安装 mod_python apt-get install libapache2-mod-python

4. 重启Apache /etc/init.d/apache2 start

5. […]

Command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc.

http://code.google.com/p/ea-utils/

Primarily written to support an Illumina based pipeline – but should work with any FASTQs.

Overview: fastq-mcf

Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering.

fastq-multx

Demultiplexes a fastq. Capable of auto-determining barcode […]

Venn diagram online software

http://bioinformatics.psb.ugent.be/webtools/Venn/

An approximate workflow for repeating the phylogenetic analysis of strawberry

An approximate workflow for repeating the phylogenetic analysis of strawberry and other plant genomes would consist of the following steps: 1) Obtain protein and nucleotide sets from the identified sources. Extract subregions of protein and nucleotide sequences specified in the gene identifiers spreadsheet and group into files by family. 2) Search nucleotide sequences for papaya […]

小生这厢有礼了(BioFaceBook Personal Blog)

分类

Recent Comments

链接表