An approximate workflow for repeating the phylogenetic analysis of strawberry

An approximate workflow for repeating the phylogenetic analysis of
strawberry and other plant genomes would consist of the following
steps: 

1) Obtain protein and nucleotide sets from the identified
sources. Extract subregions of protein and nucleotide sequences
specified in the gene identifiers spreadsheet and group into files by
family.  
2) Search nucleotide sequences for papaya and grape using
corresponding proteins from the other taxa using the program genewise
(Birney et al 2004) with the -pep option and saving the top scoring
translation.  Using tblastn of the BLAST package (Altschul et al 1997)
would work almost as well. 
3) Merge protein sequences plus translations by gene family and align
using the program muscle (Edgar, 2004) with default settings. 
4) Remove poorly aligned regions using Gblocks  (Talavera and
Castresana, 2007) using the option -5=h to retain columns with some
gaps. 
5) Concatenate trimmed alignments and reformat into Phylip format
labeling by species names.  Generate separate concatenations for genes
missing entries for up to 2 taxa, zero or 1 taxa, or present in all. 
6) Run raxml program (Stamatakis 2006) with the following command
line: 
raxmlHPC -s alignment_file -n output_file -m PROTGAMMAGTR

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>