RSeQC: quality control of RNA-seq experiments

Abstract

Motivation: RNA-seq has been extensively used for transcriptome study. Quality control (QC) is critical to ensure that RNA-seq data are of high quality and suitable for subsequent analyses. However, QC is a time-consuming and complex task, due to the massive size and versatile nature of RNA-seq data. Therefore, a convenient and comprehensive QC […]

Tutorial: Piping with samtools, bwa and bedtools

In this tutorial I hope to introduce some of the concepts for using unix piping. Piping is a very useful feature to avoid creation of intermediate use once files.

Lets begin with a typical command to do paired end mapping with bwa:

#-t 4 is for using 4 threads/cores bwa aln -t 4 ./hg19.fasta ./s1_1.fastq […]

亚马逊云 EC2 试用

亚马逊弹性计算云(Amazon EC2)是一个Web服务,提供可调整的云计算能力。它旨在为开发人员提供简便的使用网络规模计算

登录https://console.aws.amazon.com/ec2/home

 

点击Launch Instance 开始创建新的实例

选择系统,一般类型选Micro ,因为只有这个是免费的

 

 

 

一步步默认走就可以,当然可以选择 创建下载Key pair 和防火墙配置,不过这些到时完成创建实例后也能配置!

这是我已经安装好的实例,一般创建新的实例最好把测试的都terminated 意为删除!以免多收$

EC2收费标准请看http://aws.amazon.com/ec2/pricing/

Free Tier*

As part of AWS’s Free Usage Tier, new AWS customers can get started with Amazon EC2 for free. Upon sign-up, new AWS customers receive the following EC2 services each month for one […]

高通量测序与云计算

高通量测序(下一代测序)最大的特点就是产生海量的数据,454测序运行一次可以产生400M左右的数据,Illumina HiSeq运行一次产生的数据量高达200G!这么多数据出来以后,必然需要大量的计算,而随着高通量测序在各个领域的广泛应用,个人计算机和工作站显然将无法完成这种数据处理工作。一些大公司或高校可以用他们自己的超级计算机进行计算,如华大拥有数个大型生物信息学超级计算中心,港大有HPC。那一些小的公司和科研单位怎么办呢?

云计算是个非常合适的选择。云计算(Cloud computing)是一种基于互联网的计算方式,通过这种方式,共享的软硬件资源和信息可以按需提供给计算机和其他设备。整个运行方式很像电网(摘自维 基百科)。简单地说就是可以通过互联网,把数据放到“云”中进行计算。目前Google、亚马逊(Amazon)和微软都在开发并提供云计算服务,比较适 合进行高通量测序数据处理的应该是亚马逊的AWS。

今天简单了解了一下亚马逊提供的云计算,觉得挺不错的,灵活且价格便宜:

(1) 进行计算的时候才收费,不用的时候不收费; (2) 操作系统可以自由选择Windows和Linux,而港大的HPC只有Linux可用…… (3) 价格非常便宜,以EC2为例,标准情况下,1个Instance(大致相当于一台普通电脑的计算能力吧)使用1小时只要0.085美元。这样,租20台电脑运行1天(24小时),才40美元多一点,大致相当于260RMB,简直是太便宜了。

事实上,已经有很多人在用云计算在进行高通量测序数据处理了。请看:这里。

一个生物领域的新技术,一个计算机领域的新技术,这么一碰,火花就产生了。有点可惜的是,在这两个领域,中国都没有掌握核心技术,远远落后,需要加油!

转载自:有个博客 [ http://www.yelinsky.com/blog/ ]

本文链接地址:http://www.yelinsky.com/blog/archives/349.html

在亚马逊EC2上部署Apache和Django

EC2是亚马逊(Amazon.com)提供的弹性云计算服务; Apache是一个跨平台的Web服务器端软件,可以使Python、PHP、Perl等语言编写的程序运行在服务器上; Django是一个Web程序框架,应用这个框架,可以使Python Web程序的编写变得更加简单; Amazon S3是亚马逊提供的云存储服务; Amazon EC2与Amazon S3结合, 几乎可以提供无限的存储空间和无限的计算能力。

以上这些东西综合在一起,就可以用简单易用的Python做出一个提供海量数据处理功能的网站,感觉这玩意儿应该在高通量测序数据数据处理方面有点用。

下面是在亚马逊EC2上部署Apache和Django的步骤:

0. 首先需要AWS上在建立一个EC2 Instance,使用Ubuntu Linux系统,可以直接在Community AMI中直接选择Ubuntu官方的AMI,ID为ami-cef405a7,EC2 Instance的建立过程并不复杂,这里就不细说了。注意:建好之后用SSH登录的时候,用户名是ubuntu,不是ec2-user,也不是 root.

1. 安装apache sudo apt-get install apache2

2. 下载安装Django wget http://www.djangoproject.com/download/1.3/tarball/ 下载下来的文件名是index.html,改一下名 mv index.html Django-1.3.tar.gz 解压 tar xzvf Django-1.3.tar.gz 安装 cd Django-0.91 sudo python setup.py install

3. 安装 mod_python apt-get install libapache2-mod-python

4. 重启Apache /etc/init.d/apache2 start

5. […]

blast2go

 

 

shenzy@shenzy-ubuntu:/mnt/disk_xp/linux_shenzy$ wget http://www.blast2go.com/data/blast2go/local_b2g_db_tutorial_0809.zip –2012-06-25 16:29:42– http://www.blast2go.com/data/blast2go/local_b2g_db_tutorial_0809.zip Resolving www.blast2go.com… 85.25.117.102 Connecting to www.blast2go.com|85.25.117.102|:80… connected. HTTP request sent, awaiting response… 200 OK Length: 90614 (88K) [application/zip] Saving to: `local_b2g_db_tutorial_0809.zip’

100%[=======================================================================================================================================>] 90,614 52.5K/s in 1.7s

shenzy@shenzy-ubuntu:/mnt/disk_xp/linux_shenzy/local_b2g_db_tutorial$ mysql -u shenzy -p blast2go < go_201206-assocdb-data

 

 

Command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc.

http://code.google.com/p/ea-utils/

 

Primarily written to support an Illumina based pipeline – but should work with any FASTQs.

Overview: fastq-mcf

Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering.

fastq-multx

Demultiplexes a fastq. Capable of auto-determining barcode […]

Venn diagram online software

http://bioinformatics.psb.ugent.be/webtools/Venn/

An approximate workflow for repeating the phylogenetic analysis of strawberry

An approximate workflow for repeating the phylogenetic analysis of strawberry and other plant genomes would consist of the following steps: 1) Obtain protein and nucleotide sets from the identified sources. Extract subregions of protein and nucleotide sequences specified in the gene identifiers spreadsheet and group into files by family. 2) Search nucleotide sequences for papaya […]

ELPH : Estimated Locations of Pattern Hits

ELPH : Estimated Locations of Pattern Hits

Overview

ELPH is a general-purpose Gibbs sampler for finding motifs in a set of DNA or protein sequences. The program takes as input a set containing anywhere from a few dozen to thousands of sequences, and searches through them for the most common motif, […]