An approximate workflow for repeating the phylogenetic analysis of strawberry and other plant genomes would consist of the following steps: 1) Obtain protein and nucleotide sets from the identified sources. Extract subregions of protein and nucleotide sequences specified in the gene identifiers spreadsheet and group into files by family. 2) Search nucleotide sequences for papaya and grape using corresponding proteins from the other taxa using the program genewise (Birney et al 2004) with the -pep option and saving the top scoring translation. Using tblastn of the BLAST package (Altschul et al 1997) would work almost as well. 3) Merge protein sequences plus translations by gene family and align using the program muscle (Edgar, 2004) with default settings. 4) Remove poorly aligned regions using Gblocks (Talavera and Castresana, 2007) using the option -5=h to retain columns with some gaps. 5) Concatenate trimmed alignments and reformat into Phylip format labeling by species names. Generate separate concatenations for genes missing entries for up to 2 taxa, zero or 1 taxa, or present in all. 6) Run raxml program (Stamatakis 2006) with the following command line: raxmlHPC -s alignment_file -n output_file -m PROTGAMMAGTR
Recent Comments