velvet assembly based on reference

(from a user-discussion group)

Dear Velvet users,

it’s my pleasure to announce the 1.0.01 release of Velvet!

Thanks to the Columbus module, Velvet now doubles up as an assisted assembler. In other words, if you already have sequences to guide the assembly (regions, contigs, exons, anything really), you can map the reads onto these sequences (or onto a reference genome if more convenient) and feed the reference along with the alignment files to Velvet.

Previously, it was possible to read SAM/BAM files and input reference sequences as -long. However this approach had the weakness of occasionally folding up or aligning sequences which contained repeated k-mers. Columbus now treats reference sequences with care, preventing them from overlapping between themselves. If they contain repeated k-mers, then separate instances of these k-mers are created. Previously, a read which contained a repeated k-mer was simply mapped to a tangled node. Columbus now uses the alignment information (which is presumably more reliable, being a global alignment) to assign the read to a given copy of this repeat.

At the same time, Velvet maintains all of its de novo capabilities. In other words. after assigning all that it can to reference regions, it has the capacity to edit, extend, connect, and occasionally rearrange the reference regions.

This module should be  helpful for assisted transcriptome assembly, SV reconstruction, or local re-sequencing, for example.  I included a fairly lengthy extension of the Manual to describe how to use Columbus in practice.

For practical purposes, Velvet remains backward compatible, so all your current scripts should go on working as before, with performance hardly affected.

Best regards,

Once you create a Velvet assembly you can check the following stats:

1) N50

2) Total number of bases

Map reads on to the assembled contigs and check

3)Homo SNPs for its promiscuity of the bases; that is, homo SNPs should not be there since the reads and assembly should have identical bases in principle.

4)Hetero SNP. Usually hetero SNPs indicate assembly errors such as duplication, deletion, missing assembly, missing paralogs etc. Rarely a large number of hetero SNPs turned out to be true and it is deceptively difficult to find assembly errors.

Daniel

The manual is inside of the Velvet release.

Using the Columbus extension to Velvet
Daniel Zerbino
June 12, 2010
Abstract
Since its 1.0 release, the Velvet short-read assembler contains a module called
Columbus which allows the user to provide reference sequences along with
mappings of sequencing reads onto those reference sequences, to efficiently
assist the assembly process. This short manual describes how to use the
Columbus module within the Velvet package. Users unfamiliar with Velvet’s
use should first refer to the main Velvet Manual.
1 For impatient people
> head myRegions.fa
>chr1:123456789-123457789
ATGTGTGTACTAGCTAGCGCGCTAGCTAGTCATGTGTGTACTAGCTAGCGCGCTAGCTAGTC
[etc …]
> sort myReads.sam > mySortedReads.sam
> velveth my_dir 21 -reference myRegions.fa \
-shortPaired -sam mySortedReads.sam
> velvetg my_dir [etc …]

 

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>