Top 10 Reasons to Study Bioinformatics

The following list of reasons to do a Ph.D. or postdoc in bioinformatics or computational biology appeared on Casey Bergman’s blog.

0. Computing is the key skill set for 21st century biology 1. Computational skills are highly transferrable 2. Computing will help improve your core scientific skills 3. You should use you Ph.D./Post-Doc to develop […]

Bioinformatics for personal genome interpretation

http://bib.oxfordjournals.org/content/13/4/495.full

Key Points

Vast amounts of variation data from genome sequencing studies need to be analyzed to understand its association with various phenotypes.

Well-curated databases, reliable tools for gene prioritization and accurate methods for predicting the impact of variants will be essential for the interpretation of personal genomes.

Standard and unified protocols […]

Databases and resources for personal genome interpretation

 

Database

URL

Description

Short variations—SNVs, short indels

1000 Genomes

http://www.1000genomes.org

Human short variants and inferred genotypes

dbSNP

http://www.ncbi.nlm.nih.gov/projects/SNP

Short variants from all species

HapMap

http://www.hapmap.org

Human short variants […]

Editorial: Current progress in Bioinformatics 2012

In this issue, we present an annual review of progress in critical areas ofbiomedical computing and informatics. Our ability to collect and storebiological and medical data has continued to increase at astoundingrates over the last few years. As a result, the scientific and technicalchallenges for informatics and computing more generally have alsoincreased. A colleague from […]

IMG- another microbial genomes database (Get more info than NCBI)

IMG Home

Tutorial: Piping with samtools, bwa and bedtools

In this tutorial I hope to introduce some of the concepts for using unix piping. Piping is a very useful feature to avoid creation of intermediate use once files.

Lets begin with a typical command to do paired end mapping with bwa:

#-t 4 is for using 4 threads/cores bwa aln -t 4 ./hg19.fasta ./s1_1.fastq […]

Some other replaced databases for KEGG paywalled

PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System have a set of Pathways. Not many prokaryotic genomes are available at PANTHER. List of species here Biocyc BioCyc is a collection of 1962 Pathway/Genome Databases (PGDBs). Each PGDB in the BioCyc collection describes the genome and metabolic pathways of a single organism.

BioCyc is a […]

blast 报错问题 awk 解决

BlastOutput.iterations.E. Invalid value(s) [9] in VisibleString [AGO_S073922.1_geneid.geneid 2 exon (s) 261 – 281 6 aa, chain – incompleted …] [blastall] ERROR: 20110903163100/t1.fna.blast.m7Output BlastOutput.iterations.E. Invalid value(s) [9] in VisibleString [AGO_S020563.1_geneid.geneid 2 exon (s) 1646 – 1669 6 aa, chain + incompleted …] [blastall] ERROR: 20110903163100/t1.fna.blast.m7Output BlastOutput.iterations.E. Invalid value(s) [9] in VisibleString [AGO_S075634.1_geneid.geneid 2 exon (s) 252 […]

序列长度分布直方图

step1:通过mothur 中的summary.seqs 可以很方便获取长度分布信息

mothur > summary.seqs(fasta=AMIgene_11a.pep) Start End NBases Ambigs Polymer NumSeqs Minimum: 1 20 20 16 4 1 2.5%-tile: 1 55 55 42 8 35 25%-tile: 1 144 144 112 14 350 Median: 1 242 242 189 17 699 75%-tile: 1 380 380 293 21 1048 97.5%-tile: 1 828 828 646 33 1363 Maximum: […]

blast 批量查询

问题是这样的:有很多很多序列,几百条,想大致了解一下这些序列分别是什么样的微生物,如果一条一条去blast,那是相当的累。想找一个工具告诉我每条序列blast结果的前几条的名称是什么即可,不需要其它信息。

在网上找了一下,没找到合适的软件或工具,虽然有些关于批量blast的教程之类的,比如这个,但是给出的结果及其繁琐,很多不需要的信息。

后来发现Biopython可以很简单就进行批量Blast。只需先安装Python和Biopython,Python和Biopython的下载地址分别为: http://www.python.org/download/ http://www.biopython.org/wiki/Download

Windows版本下载后直接双击安装即可,非常简单。 然后打开IDLE(Python GUI),”File”->”New Window”, 分如下两步进行:

第一步,运行下面的代码进行Blast

from Bio.Blast import NCBIWWW from Bio import SeqIO import time SeqNumber = 0 for record in SeqIO.parse(“allseq.seq”, “fasta”): SeqNumber += 1 try: result_handle = NCBIWWW.qblast(“blastn”, “nr”, record.seq) save_file = open(‘xml\\’+str(SeqNumber)+’.xml’, ‘w’) save_file.write(result_handle.read()) save_file.close() print SeqNumber,’ OK!’ except: print SeqNumber,’ Error! Will try again later!’ […]