ELPH : Estimated Locations of Pattern Hits

ELPH : Estimated Locations of Pattern Hits

Overview

ELPH is a general-purpose Gibbs sampler for finding motifs in a set of DNA or protein sequences. The program takes as input a set containing anywhere from a few dozen to thousands of sequences, and searches through them for the most common motif, […]

LOCAS, a new NGS assembler particularly designed for low coverage assembly of eukaryotic genome

Next Generation Sequencing (NGS) is a frequently applied approach to detect sequence variationsbetween highly related genomes. Recent large-scale re-sequencing studies as the Human 1000 GenomesProject utilize NGS data of low coverage to afford sequencing of hundreds of individuals. Here, SNPsand micro-indels can be detected by applying an alignment-consensus approach. However,computational methods capable of […]

ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data

Description

ANNOVA is an efficient software tool to utilize update-to-date information to functionally annotategenetic variants detected from diverse genomes. Given a list of variants with chromosome, startposition, end position and observed nucleotides, ANNOVAR can identify whether SNPs or indels causeprotein coding changes and what is the amino acids that were changed, or identify variants […]

ABACAS: Algorithm Based Automatic Contiguation of Assembled Sequences

http://abacas.sourceforge.net/index.html

ABACAS is intended to rapidly contiguate (align, order, orientate), visualize and design primers to close gaps on shotgun assembled contigs based on a reference sequence.

ABACAS uses MUMmer to find alignment positions and identify syntenies of assembled contigs against the reference. The output is then processed to generate a pseudomolecule taking overlapping […]

ABBA: Assembly Boosted By Amino acid sequences

ABBA From amos Jump to: navigation, search

ABBA: Assembly Boosted By Amino acid sequences

Contents

[hide] 1 Overview 2 Download 3 References 4 Acknowledgements

Overview

Assembly Boosted By Amino acid sequence is a comparative gene assembler, which uses amino acid sequences from predicted proteins to help build a better assembly. see the journal paper.

[…]

one command line for getting consensus sequences from bam file

samtools mpileup -uf ref.fa aln.bam | bcftools view -cg – | vcfutils.pl vcf2fq > cns.fq

Merging separate sequence and quality files to FASTQ

#!/usr/bin/perl -w use strict; use Bio::SeqIO; use Bio::Seq::Quality; use Getopt::Long; die “pass a fasta and a fasta-quality file\n” unless @ARGV; my ($seq_infile,$qual_infile) = (scalar @ARGV == 1) ?($ARGV[0], “$ARGV[0].qual”) : @ARGV; ## Create input objects for both a seq (fasta) and qual file my $in_seq_obj = Bio::SeqIO->new( -file => $seq_infile, -format => ‘fasta’, ); my […]

sam 2 bam 格式文件处理

Use the built-in “samtools sort” command rather than a generic sort. Samtools sort works on BAM files so you should convert your alignments to BAM format using

samtools view -bS filename.sam > filename.bam samtools sort filename.bam sorted samtools index sorted.bam

This ofcourse assumes you have generated the SAM file using method (1).

The file “sorted.bam” […]

GYRA : CCMAR Computational Cluster (NGS)

Welcome to the CCMAR Computational Cluster Facility: GYRA – gyra.ualg.pt¶

The GYRA cluster facility is administered and maintained by Cymon J. Cox of the Plant Systematics and Bioinformatics Research Group (PSB).

System:¶

The GYRA cluster facility consists of:

Frontend: 16-core 2.3GHz 32GB DELL PowerEdge R715 compute-0-0: 8-core 2.6GHz 8GB DELL PowerEdge SC1435 compute-0-1: 8-core […]

Gene Prediction

Bacteria Gene Prediction

* GeneMark and Glimmer This provides the foundation for operon predictions and promotor predictions. One way to verify the gene prediction result is to check the presence of Shine-Dalgarno sequence in fron of each gene which is a purine-rich region with a consensus AGGAGG and is located within 20 bp upstream of […]