Trowel: a fast and accurate error correction module for Illumina sequencing reads

Trowel: a fast and accurate error correction module for Illumina sequencing reads.

PRADA is a pipeline to analyze paired end RNA-Seq data to generate gene expression values (RPKM) and gene-fusion candidates.

PRADA

PRADA
Overview
Description PRADA is a pipeline to analyze paired end RNA-Seq data to generate gene expression values (RPKM) and gene-fusion candidates.
Development Information
Language Python
Current Version 1.1
Platforms Un*x (OpenPBS)
License MIT
Status Active
Last Updated April 2013
References
Citations No Formal Publications
Help and Support
Contact Roel Verhaak
Discussion Project Forum

 

Massively parallel sequencing of cDNA reverse transcribed from RNA (RNASeq) provides an accurate estimate of the quantity and composition of mRNAs. To characterize the transcriptome through the analysis of RNA-seq data, we developed PRADA. PRADA focuses on the processing and analysis of gene expression estimates, supervised and unsupervised gene fusion identification, and supervised intragenic deletion identification. The BAM files generated by the pipeline are readily compatible with different tools for mutation calling and to obtain read counts for further downstream analysis.

Contents

[hide]

Modules

PRADA currently supports 6 modules to process and identify abnormalities from RNAseq data:

preprocess : Generates aligned and recalibrated BAM files.
fusion : Identifies candidate gene fusions.
guess-ft : Supervised search for fusion transcripts.
guess-if : Supervised search for intragenic rearrangements.
homology : Calculates homology between given two genes.
frame : Predicts functional consequence of fusion transcript

Documentation

Detail description of installation steps and the usage of each module with examples is available in the documentation.

Installation

在linux系统终端里用amber的tleap生成top和crd文件的基本程序 – 【AMBER】 – 分子模拟论坛 Molecular Simulation Forums – Powered by haotui.com

ff99SBildn

via 在linux系统终端里用amber的tleap生成top和crd文件的基本程序 – 【AMBER】 – 分子模拟论坛 Molecular Simulation Forums – Powered by haotui.com.

How to Run AMBER

AMBER, that stands for Assisted Model Building with Energy Refinement, is the collective name for a suite of programs that allow users to carry out molecular dynamics simulations, particularly on biomolecules.

Version 10 of AMBER has been installed on Lewis.

On lewis, it is installed under /share/apps/amber10.  The executables are under /share/apps/amber10/exe.

1. Set the PATH Environment Variable

Users may include the executables in their PATH by typing the following command so that they don’t have to include the path in their commands to run the AMBER program/tools.

export PATH=$PATH:/share/apps/amber10/exe

or add the above command to your “~/.bashrc” file.

2. Prepare Input Files

Before running AMBER, users must generate AMBER topology and coordinate files. The following are a simple example to run an AMBER(GB) minimization for “test.pdb”.

First, create a script file named “test.leap.scrpt” that includes the following line:

source leaprc.ff03
set default PBradii bondi
mol = loadpdb test.pdb
saveamberparm mol test.prmtop test.prmcrd
quit

where “leaprc.ff03″ is the AMBER force field parameter file.

Then, run “tleap” by typing on command line:

tleap -f test.leap.scrpt

This will generate the topology file “test.prmtop” and coordinate file “test.prmcrd”.

3. Run AMBER Jobs

There are both serial and parallel versions of the executables.  

1) Run Serial Jobs on Lewis

The serial version can be run with the following command

bsub -J jobname -oo jobname.o%J -eo jobname.e%J sander -O -i inputfile -o outfile -p test.prmtop -c test.prmcrd -r test.restrt

where the input file “inputfile” looks like this

Minimization
&cntrl
imin=1, maxcyc=100,
cut=300.0, igb=2, saltcon=0.2, gbsa=1,
ntpr=10, ntx=1, ntb=0,
&end

2) Run Parallel Jobs on Lewis

The parallel versions have a qualifier of “.MPI” on the end, e.g. sander.MPI.  A sample LSF job script on lewis, for the parallel version of sander might look like:

# Set job parameters
#BSUB -a openmpi
#BSUB -J jobname
#BSUB -oo jobname.o%J
#BSUB -eo jobname.e%J

# Set number of CPUs
#BSUB -n 4

# Start AMBER job
mpirun.lsf sander.MPI -O -i inputfile -o outputfile -p test.prmtop -c test.prmcrd -r test.restrt

4. Convert the Minimized Coordinates to a PDB File

The miminized structure can be converted to the PDB file by the following command

ambpdb -p test.prmtop < test.restrt > test_min.pdb

*There are a number of tutorials prepared by the AMBER developers about learning how to use the AMBER software suite, which are listed as follows.

For more information of AMBER, see the Amber Users’ Manual.

Amber generate file of inpcrd problem

source leaprc.ff10
loadamberparams frcmod.ionsjc_tip3p
set default PBradii mbondi2
1vom = loadpdb 1vom-clean.pdb
addions 1vom Na+
saveamberparm 1vom prmtop inpcrd

########### resolve a problem

export AMBERHOME=`pwd`
root@shenzy-pc:/home/shenzy/lib/amber14# ./update_amber –apply leap.patch
export AMBERHOME=`pwd`
./update_amber –apply http://ambermd.org/bugfixes/AmberTools/13.0/cygwin_fix

 

root@shenzy-pc:/home/shenzy/lib/amber14# cat leap.patch
*******>leap.patch

Author: Jason Swails

Date: 30 April, 2014

Programs: tleap, xleap

Description: Solves potential inpcrd file truncation from tleap

——————————————————————————–

diff –git AmberTools/src/leap/src/leap/unitio.c AmberTools/src/leap/src/leap/unitio.c
index cbab939..4e78ee3 100644
— AmberTools/src/leap/src/leap/unitio.c
+++ AmberTools/src/leap/src/leap/unitio.c
@@ -6897,6 +6897,7 @@ if (bPolar && GDefaults.iIPOL <= 0) // NOT allowed to save IPOL=0 for polarizabl
VarArrayDestroy(&vaExcludedCount);
VarArrayDestroy(&vaNBIndex);
VarArrayDestroy(&vaNonBonds);
+ fclose( fCrd );

}

@@ -8528,6 +8529,7 @@ IX_DESC iResIx;
VarArrayDestroy( &vaExcludedCount );
VarArrayDestroy( &vaNBIndex );
VarArrayDestroy( &vaNonBonds );
+ fclose( fCrd );

}
static char *prepfmt = ” %-3d %-4s %-4s %c %f %f %f %f\n”;

 

 

 

> Thanks for the response. 

> Yes, I am sure that I am using ntb = 0. I also know that it works prior to 
> adding ions, so adding ions is what breaks it. 


​Yikes… Don’t do this ;). If you plan on running with explicit solvent, 
do this off the bat. (pmemd.cuda doesn’t even support vacuum dynamics). 
 The best way to add ions to a system via tleap is actually to add the 
solvent first and then replace random solvent molecules with ions (at least 
IMO — YMMV). So something like this: 

system = loadPDB you_system.pdb 
ions = loadAmberParams frcmod.ionsjc_tip3p 
solvateOct system TIP3PBOX 15.0 
addIonsRand system Na+ 0 
saveAmberParm system your_system.parm7 your_system.rst7 

The first line loads your system. The second loads the ion parameters (as 
Ilyas mentioned, the “ions=” part is optional). The third solvates the 
system with TIP3P waters in a truncated octahedron with at least a 15 
Angstrom buffer around the solute.​ The fourth replaces enough waters with 
sodium ions to neutralize the system. The last saves your topology file. 
(Of course this assumes a default leaprc file, which you mentioned you had.) 


> Interestingly, the last two lines of the inpcrd of the protein only system 
> reads 

> 82.4450000 14.3940000 -13.6870000 81.7180000 15.3700000 -13.4430000 
> 83.6048830 14.5521392 -13.3126966 

> whereas the inpcrd with ions reads 

> 86.3470000 12.2630000 -15.2900000 87.2459461 11.6555756 -15.3950273 
> 86.5023142 13.2275008 -15.7734415 86.1831237 12.5406017 -13.8148107 
> 87.2007​ 


> This does not look correct to me. But it is purely generated by leap. 


​Looks like you got clipped with a bug we recently found (LEaP occasionally 
truncates files​). An update should be posted tomorrow morning (EST in the 
USA) — here is the relevant thread on the mailing list describing the bug 
http://archive.ambermd.org/201404/0532.html (there is a proposed patch 
attached there that you can download and apply if you don’t want to wait). 

Why are there 6 columns in the first place? I’d expect 3 columns * 
> numberOfAtoms rows. Forgive my gromacs inclination, is this the amber way? 


​Yes — this is the amber way. It is the x, y, z coordinates of two atoms. 

​​ 
> (CRD spec states FORMAT(6F12.7), which is again cryptic to me.) 


​This is a Fortran format statement. It indicates that there are at most 6 
Floating point numbers per line, each with 7 decimal points that take up 
exactly 12 characters. Amber is traditionally a Fortran code-base 
(exclusively back in the early days of Amber 3 to 4, although it is FAR 
more diverse now). You can still see strong remnants of this Fortran 
heritage everywhere. 

And in the last one, there is a row with a single column. Is this supposed 
> to happen? 


​No. It’s a bug that should hopefully have a fix posted soon (see above). 
​ 


> Regards, 

> Murat 



> Here is the full(ish) output including the error. : 

> ​[snip] 

> ——————————————————————————– 
> 1. RESOURCE USE: 

> ——————————————————————————– 

> | Flags: 

> | NONPERIODIC ntb=0 and igb=0: Setting up nonperiodic simulation 
> At line 842 of file ew_box.F90 (unit = 9, file = ‘inpcrd’) 
> Fortran runtime error: End of file 


​You got hit by the leap bug — the file was truncated.​ 

Hope this helps, 
Jason 

Scratch Protein Predictor

Scratch Protein Predictor.

MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments

MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments.

R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment

R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment.

经典的gnome界面将在ubuntu12.04回归

Ubuntu 12.04于2012年4月26日发布。面对采用了Unity的Ubuntu 12.04,也许有人不愿意升级甚至投向其它Linux发行版,然而无论升级还是投入其它桌面环境,都意味着转换成本太大:使用习惯要改变,熟悉的系统自带软件也会面临改动。

Gnome, KDE, Lxde, Xfce等桌面环境历来各成一体,自带的常用软件非常不一样,熟能生巧可能比转向一个更好的工具更重要,再说另一个桌面环境下的软件就一定好用?而如果用户重新安装自己熟悉的软件,也绝非易事,折腾的事情并不是许多人愿意做的。

Ubuntu 12.04当然有性能方面的提升,那么如何享受到Ubuntu 12.04的性能提升,而又不改变一如既往的使用习惯呢?简言之:远离Unity,保持原来的经典桌面?

别忘了,Linux的自由虽然昂贵,但毕竟自由,Ubuntu也不例外,我们完全可以“自由地”返回到经典的Gnome界面。

要想删除Unity恢复到经典Gnome桌面也很简单,几乎就是一条命令的事情——命令这种东西虽然不直观,但非常可靠和快捷,同时按住Ctrl+Alt+T三个键,调出系统终端,输入:

sudo apt-get install gnome-session-fallback

设置Ubuntu 12.04 Unity返回到经典Gnome桌面

然后再输入系统密码,系统将安装经典界面,不喜欢命令行的话也可以在软件中心搜索‘gnome-panel’, 找到后点击安装。安装完成后登出系统重新登录。

设置Ubuntu 12.04 Unity返回到经典Gnome桌面

重新登录时选择小扳手图标,然后再选择下图所示的经典桌面,以后系统开机登录时将自动选择此桌面登录。

设置Ubuntu 12.04 Unity返回到经典Gnome桌面

在经典桌面下,如果要添加快捷方式到顶层面板,只需按住Alt键的同时,把要想添加的快捷方式拖放到面板上即可。如需删除或移除顶层面板上的快捷方式,同样需要按住Alt键并右击鼠标右键进行操作。

就这样很容易地退回到经典Gnome界面下的Ubuntu, 就像一切都没有发生过,拥有Ubuntu 12.04的核心却无Unity的界面。

本文图片来源:http://www.liberiangeek.net/

pseudogene predict

shenzy@shenzy-ubuntu:~/Desktop/zhanglei/pgenes/pseudopipe/bin$ ./pseudopipe.sh /home/shenzy/Desktop/zhanglei/pgenes/ppipe_output/prodigal/tcs4/ /home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/prodigal/tcs4/dna/tcs4_genome.fasta.masked /home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/prodigal/tcs4/dna/tcs4_genome.%s.fasta /home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/prodigal/tcs4/pep/tcs4_prodigal.pep /home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/prodigal/tcs4/mysql/contig.%s_exlocs 0
outDir=/home/shenzy/Desktop/zhanglei/pgenes/ppipe_output/prodigal/tcs4
rmkDir=/home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/prodigal/tcs4/dna/tcs4_genome.fasta.masked
Making directories
Copying sequences
inputDNA=/home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/prodigal/tcs4/dna/tcs4_genome.fasta.masked
Fomatting the DNAs
formatDB start…….
Preparing the blast jobs
Finished blast
Processing blast output
Finished processing blast output
Running Pseudopipe on both strands
Working on M strand
Finished Pseudopipe on strand M
Working on P strand
Finished Pseudopipe on strand P
Generating final results
Finished generating pgene full alignment
Finished running Pseudopipe

 

 

 

 

 

./pseudopipe.sh /home/shenzy/Desktop/zhanglei/pgenes/ppipe_output/caenorhabditis_elegans_62_220a /home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/dna_rm.fa /home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/Caenorhabditis_elegans.WS220.62.dna.chromosome.%s.fa /home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/pep/Caenorhabditis_elegans.WS220.62.pep.fa /home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/mysql/chr%s_exLocs 0

 

http://www.pseudogene.org/DOWNLOADS/pipeline_codes/readme.txt

http://www.pseudogene.org/DOWNLOADS/pipeline_codes/

 

shenzy@shenzy-ubuntu:~/Desktop/zhanglei/pgenes/pseudopipe/bin$ ./pseudopipe.sh /home/shenzy/Desktop/zhanglei/pgenes/ppipe_output/caenorhabditis_elegans_62_220a /home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/dna_rm.fa /home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/Caenorhabditis_elegans.WS220.62.dna.chromosome.%s.fa /home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/pep/Caenorhabditis_elegans.WS220.62.pep.fa /home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/mysql/chr%s_exLocs 0
outDir=/home/shenzy/Desktop/zhanglei/pgenes/ppipe_output/caenorhabditis_elegans_62_220a
rmkDir=/home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/dna_rm.fa
Making directories
Copying sequences
inputDNA=/home/shenzy/Desktop/zhanglei/pgenes/ppipe_input/caenorhabditis_elegans_62_220a/dna/dna_rm.fa
Fomatting the DNAs
Preparing the blast jobs
Finished blast
Processing blast output
Finished processing blast output
Running Pseudopipe on both strands
Working on M strand
Finished Pseudopipe on strand M
Working on P strand
Finished Pseudopipe on strand P
Generating final results
Finished generating pgene full alignment
Finished running Pseudopipe

 

./pseudopipe.sh

#/bin/sh
if [ $# -lt 5 ]
then
echo “Usage: ppipe [output dir] [masked dna dir] [input dna dir] [input pep dir] [exon mask dir] [PBS?]”
exit 1
fi

. `dirname $0`/../bin/env.sh

outDir=`if [ ! -d $1 ]; then mkdir -p $1; fi;cd $1;pwd`
rmkDir=`if [ -d $2 ]; then echo \`cd $2;pwd\`; else echo $2; fi`
pepDir=`if [ -d $4 ]; then echo \`cd $4;pwd\`; else echo $4; fi`
dnaTmp=`if [ -d $3 ]; then echo \`cd $3;pwd\`/%s.fa; else echo $3; fi`
emkTmp=`if [ -d $5 ]; then echo \`cd $5;pwd\`/%s_exLocs; else echo $5; fi`

echo “outDir=”$outDir
echo “rmkDir=”$rmkDir
inputDNA=$outDir/dna/dna_all.fa
inputPEP=$outDir/pep/pep_all.fa

jobsExec=$sqDummy; if [ “$6″ = “1” ]; then jobsExec=$sqDedicated; fi
echo Making directories
cd $outDir
if [ ! -d dna ]; then mkdir dna; fi
if [ ! -d pep ]; then mkdir pep; fi
if [ ! -d blast ]
then
mkdir blast
cd blast
mkdir stamps
mkdir output
mkdir status
mkdir processed
cd ..
fi
if [ ! -d pgenes ]
then
mkdir pgenes
cd pgenes
mkdir minus plus
cd minus
mkdir log stamp
cd ../plus
mkdir log stamp
cd ../../
fi

echo Copying sequences
if [ ! -f $inputDNA ]
then
if [ -f $rmkDir ]
then
inputDNA=$rmkDir
else
cat $rmkDir/*.fa > $inputDNA
fi
fi

echo “inputDNA=”$inputDNA

if [ ! -f $inputPEP ]
then
if [ -f $pepDir ]
then
inputPEP=$pepDir
else
cat $pepDir/*.fa > $inputPEP
fi
fi

echo Fomatting the DNAs
cd dna
if [ ! -f formatdb.log ]
then
if [ -f $rmkDir -a -f $rmkDir.nin ]
then
for i in $rmkDir.*
do
ext=`echo $i | sed ‘s/.*\.\(.*\)/\1/’`
if [ ! -e $inputDNA.$ext ]
then
ln -s $i $inputDNA.$ext
fi
done
touch formatdb.log
else
echo “formatDB start…….”
$formatDB -i $inputDNA -o T -p F
fi
fi
cd ..

echo Preparing the blast jobs
cd blast

if [ ! -f jobs ]
then
pepNum=`grep ‘>’ $inputPEP | wc | sed ‘s/\s*\([0-9]*\).*/\1/’`
$pythonExec $fastaSplitter $inputPEP $((($pepNum+359)/360)) ‘split%04d’
for f in `ls split*`
do
echo “\”( cd $(pwd); touch stamps/${f}.Start ; ( $blastExec -p tblastn -m 8 -z 3.1e9 -e .1 -d $inputDNA -i $f -o output/${f}.Out ; touch stamps/${f}.Stamp ) >status/${f}.Status 2>&1 )\””
done > jobs

$pythonExec $jobsExec jobs

echo Finished blast
else
echo Skipping blast
fi

echo Processing blast output
cd processed
if [ ! -f ./processed.stamp ]
then
echo “\”(cd $(pwd); $pythonExec $blastHandler $inputPEP ‘split\d{4}.Out\Z’ ../output; touch processed.stamp)\”” > jobs

$pythonExec $jobsExec jobs

echo Finished processing blast output
else
echo Skipping the processing of blast output
fi

echo Running Pseudopipe on both strands
cd ../../pgenes

for t in ‘M’ ‘P’
do
echo ‘Working on ‘$t’ strand’

if [ $t = ‘M’ ]
then
cd minus
else
cd plus
fi

echo “export BlastoutSortedTemplate=${outDir}/blast/processed/%s_${t}_blastHits.sorted;export ChromosomeFastaTemplate=${dnaTmp};export ExonMaskTemplate=${emkTmp};export ExonMaskFields=’2 3′;export FastaProgram=${fastaExec};export ProteinQueryFile=${inputPEP}” > setenvPipelineVars

ms=$(cd ../../blast/processed ; for f in *_${t}_*sorted; do echo ${f/_${t}_blastHits.sorted/}; done)

for c in $ms
do
echo “\”(cd $(pwd); . setenvPipelineVars; touch stamp/$c.Start ; $pythonExec $pseudopipe $c > log/$c.log 2>&1; touch stamp/$c.Stop)\””
done > jobs

$pythonExec $jobsExec jobs

echo Finished Pseudopipe on strand $t

cd ..
done

echo Generating final results
outFilePrefix=$outDir/pgenes/`basename $outDir`_pgenes
$genPgeneResult $outDir $outFilePrefix.txt
$genFullAln $outDir $outFilePrefix.align.gz

echo Finished running Pseudopipe

 

NOTE:

sh1 source not found

sh source not work,    we need to use “.” to replace “source” in the sh script !

 

sh: 1: .: setenvPipelineVars: not found
Finished Pseudopipe on strand M
Working on P strand
sh: 1: .: setenvPipelineVars: not found

we must give it an absolute path for the file of  setenvPipelineVars

echo “\”(cd $(pwd); .  /home/shenzy/lib/pgenes/ppipe_output/tcs/pgenes/minus/setenvPipelineVars; touch stamp/$c.Start ; $pythonExec $pseudopipe $c > log/$c.log 2>&1; touch stamp/$c.Stop)\””