Top 10 Reasons to Study Bioinformatics

The following list of reasons to do a Ph.D. or postdoc in bioinformatics or computational biology appeared on Casey Bergman’s blog.

0. Computing is the key skill set for 21st century biology 1. Computational skills are highly transferrable 2. Computing will help improve your core scientific skills 3. You should use you Ph.D./Post-Doc to develop […]

Bioinformatics for personal genome interpretation

http://bib.oxfordjournals.org/content/13/4/495.full

Key Points

Vast amounts of variation data from genome sequencing studies need to be analyzed to understand its association with various phenotypes.

Well-curated databases, reliable tools for gene prioritization and accurate methods for predicting the impact of variants will be essential for the interpretation of personal genomes.

Standard and unified protocols […]

Databases and resources for personal genome interpretation

 

Database

URL

Description

Short variations—SNVs, short indels

1000 Genomes

http://www.1000genomes.org

Human short variants and inferred genotypes

dbSNP

http://www.ncbi.nlm.nih.gov/projects/SNP

Short variants from all species

HapMap

http://www.hapmap.org

Human short variants […]

Editorial: Current progress in Bioinformatics 2012

In this issue, we present an annual review of progress in critical areas ofbiomedical computing and informatics. Our ability to collect and storebiological and medical data has continued to increase at astoundingrates over the last few years. As a result, the scientific and technicalchallenges for informatics and computing more generally have alsoincreased. A colleague from […]

IMG- another microbial genomes database (Get more info than NCBI)

IMG Home

Some other replaced databases for KEGG paywalled

PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System have a set of Pathways. Not many prokaryotic genomes are available at PANTHER. List of species here Biocyc BioCyc is a collection of 1962 Pathway/Genome Databases (PGDBs). Each PGDB in the BioCyc collection describes the genome and metabolic pathways of a single organism.

BioCyc is a […]

blast 报错问题 awk 解决

BlastOutput.iterations.E. Invalid value(s) [9] in VisibleString [AGO_S073922.1_geneid.geneid 2 exon (s) 261 – 281 6 aa, chain – incompleted …] [blastall] ERROR: 20110903163100/t1.fna.blast.m7Output BlastOutput.iterations.E. Invalid value(s) [9] in VisibleString [AGO_S020563.1_geneid.geneid 2 exon (s) 1646 – 1669 6 aa, chain + incompleted …] [blastall] ERROR: 20110903163100/t1.fna.blast.m7Output BlastOutput.iterations.E. Invalid value(s) [9] in VisibleString [AGO_S075634.1_geneid.geneid 2 exon (s) 252 […]

blast2go 本地化数据库安装运行 以及简单在线调用

最近需要对预测到的基因进行个注释工作,着手进行blast2go的工作:

最简单方式:运用官网的免费在线调用数据库方式,(需要安装好JAVA Java Runtime Environment (JRE) from http://www.java.com/download)

步骤如下:

(1)进入官网http://www.blast2go.com/b2glaunch/start-blast2go

选择相应大小的内存,点击here,如未能直接在线运行,则会让你保存并下载blast2go.jnlp 文件。

(2)然后直接在命令行运行 javaws blast2go.jnlp 回车即可出现界面,剩下的就是简单点击界面和运行了!

##############################################

本地化数据库命令行运行方式:

B2G4PIPE – Blast2GO without graphical interface

1.从http://www.blast2go.com/b2glaunch/resources

下载相应资源

http://www.blast2go.com/data/blast2go/b2g4pipe_v2.5.zip

http://www.blast2go.com/data/blast2go/local_b2g_db_tutorial_0809.zip

下载b2g database所需文件:

http://archive.geneontology.org/latest-full/go-assocdb-data.gz

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz

ftp://ftp.pir.georgetown.edu/databases/idmapping/idmapping.tb.gz

(可选,依据mysql版本)

替换b2g_db.sql 里的TYPE=MyISAM 为 ENGINE=MyISAM

同上替换go_201110-assocdb-data里:

sed -i ‘s/TYPE=MyISAM/ENGINE=MyISAM, DEFAULT CHARACTER SET latin1/’ go_201110-assocdb-data

2. 编辑后运行tutorial 里download_and_install.sh 或像下面这样手工运行:

3. 编辑并运行b2g_db.sql:

[…]

blast 批量查询

问题是这样的:有很多很多序列,几百条,想大致了解一下这些序列分别是什么样的微生物,如果一条一条去blast,那是相当的累。想找一个工具告诉我每条序列blast结果的前几条的名称是什么即可,不需要其它信息。

在网上找了一下,没找到合适的软件或工具,虽然有些关于批量blast的教程之类的,比如这个,但是给出的结果及其繁琐,很多不需要的信息。

后来发现Biopython可以很简单就进行批量Blast。只需先安装Python和Biopython,Python和Biopython的下载地址分别为: http://www.python.org/download/ http://www.biopython.org/wiki/Download

Windows版本下载后直接双击安装即可,非常简单。 然后打开IDLE(Python GUI),”File”->”New Window”, 分如下两步进行:

第一步,运行下面的代码进行Blast

from Bio.Blast import NCBIWWW from Bio import SeqIO import time SeqNumber = 0 for record in SeqIO.parse(“allseq.seq”, “fasta”): SeqNumber += 1 try: result_handle = NCBIWWW.qblast(“blastn”, “nr”, record.seq) save_file = open(‘xml\\’+str(SeqNumber)+’.xml’, ‘w’) save_file.write(result_handle.read()) save_file.close() print SeqNumber,’ OK!’ except: print SeqNumber,’ Error! Will try again later!’ […]

BLAST+使用方法

BLAST+与BLAST相比,有很多改进和提高,NCBI强烈推荐放弃BLAST,使用BLAST+, 这里说的BLAST和BLAST+,都是本地的,与之前的那个批量BLAST小程序不是一回事。BLAST下载地址:NCBI BLAST+ 。BLAST+的一般用法如下:

格式化数据库 makeblastdb -in db.fasta -dbtype prot -parse_seqids -out dbname 参数说明: -in:待格式化的序列文件 -dbtype:数据库类型,prot或nucl -out:数据库名

蛋白序列比对蛋白数据库(blastp) blastp -query seq.fasta -out seq.blast -db dbname -outfmt 6 -evalue 1e-5 -num_descriptions 10 -num_threads 8 参数说明: -query: 输入文件路径及文件名 -out:输出文件路径及文件名 -db:格式化了的数据库路径及数据库名 -outfmt:输出文件格式,总共有12种格式,6是tabular格式对应BLAST的m8格式 -evalue:设置输出结果的e-value值 -num_descriptions:tabular格式输出结果的条数 -num_threads:线程数

核酸序列比对核酸数据库(blastn)以及核酸序列比对蛋白数据库(blastx) 与上面的blastp用法类似: blastn -query seq.fasta -out seq.blast -db dbname -outfmt 6 -evalue […]