Databases and resources for personal genome interpretation

 

Database

URL

Description

Short variations—SNVs, short indels

1000 Genomes

http://www.1000genomes.org

Human short variants and inferred genotypes

dbSNP

http://www.ncbi.nlm.nih.gov/projects/SNP

Short variants from all species

HapMap

http://www.hapmap.org

Human short variants […]

IMG- another microbial genomes database (Get more info than NCBI)

IMG Home

Command-line tools for processing biological sequencing data. Barcode demultiplexing, adapter trimming, etc.

http://code.google.com/p/ea-utils/

 

Primarily written to support an Illumina based pipeline – but should work with any FASTQs.

Overview: fastq-mcf

Scans a sequence file for adapters, and, based on a log-scaled threshold, determines a set of clipping parameters and performs clipping. Also does skewing detection and quality filtering.

fastq-multx

Demultiplexes a fastq. Capable of auto-determining barcode […]

An approximate workflow for repeating the phylogenetic analysis of strawberry

An approximate workflow for repeating the phylogenetic analysis of strawberry and other plant genomes would consist of the following steps: 1) Obtain protein and nucleotide sets from the identified sources. Extract subregions of protein and nucleotide sequences specified in the gene identifiers spreadsheet and group into files by family. 2) Search nucleotide sequences for papaya […]

ELPH : Estimated Locations of Pattern Hits

ELPH : Estimated Locations of Pattern Hits

Overview

ELPH is a general-purpose Gibbs sampler for finding motifs in a set of DNA or protein sequences. The program takes as input a set containing anywhere from a few dozen to thousands of sequences, and searches through them for the most common motif, […]

LOCAS, a new NGS assembler particularly designed for low coverage assembly of eukaryotic genome

Next Generation Sequencing (NGS) is a frequently applied approach to detect sequence variationsbetween highly related genomes. Recent large-scale re-sequencing studies as the Human 1000 GenomesProject utilize NGS data of low coverage to afford sequencing of hundreds of individuals. Here, SNPsand micro-indels can be detected by applying an alignment-consensus approach. However,computational methods capable of […]

ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data

Description

ANNOVA is an efficient software tool to utilize update-to-date information to functionally annotategenetic variants detected from diverse genomes. Given a list of variants with chromosome, startposition, end position and observed nucleotides, ANNOVAR can identify whether SNPs or indels causeprotein coding changes and what is the amino acids that were changed, or identify variants […]

ABACAS: Algorithm Based Automatic Contiguation of Assembled Sequences

http://abacas.sourceforge.net/index.html

ABACAS is intended to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence.

ABACAS uses MUMmer to find alignment positions and identify syntenies of assembled contigs against the reference. The output is then processed to generate a pseudomolecule taking overlapping […]

ABBA: Assembly Boosted By Amino acid sequences

ABBA From amos Jump to: navigation, search

ABBA: Assembly Boosted By Amino acid sequences

Contents

[hide] 1 Overview 2 Download 3 References 4 Acknowledgements

Overview

Assembly Boosted By Amino acid sequence is a comparative gene assembler, which uses amino acid sequences from predicted proteins to help build a better assembly. see the journal paper.

[…]

one command line for getting consensus sequences from bam file

samtools mpileup -uf ref.fa aln.bam | bcftools view -cg – | vcfutils.pl vcf2fq > cns.fq