usually bioinformatics tools

http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

This directory contains applications for stand-alone use, 
built specifically for a Linux 64-bit machine.

For help on the bigBed and bigWig applications see:

http://genome.ucsc.edu/goldenPath/help/bigBed.html


http://genome.ucsc.edu/goldenPath/help/bigWig.html

View the file 'FOOTER' to see the usage statement for 
each of the applications.
      Name                    Last modified      Size  Description

      Parent Directory                             -   
      FOOTER                  12-Jun-2012 18:01   65K  
      bedClip                 12-Jun-2012 18:01  243K  
      bedExtendRanges         12-Jun-2012 18:01  2.7M  
      bedGraphToBigWig        12-Jun-2012 18:01  251K  
      bedItemOverlapCount     12-Jun-2012 18:01  2.7M  
      bedSort                 12-Jun-2012 18:01  224K  
      bedToBigBed             12-Jun-2012 18:01  334K  
      bigBedInfo              12-Jun-2012 18:01  327K  
      bigBedSummary           12-Jun-2012 18:01  330K  
      bigBedToBed             12-Jun-2012 18:01  326K  
      bigWigAverageOverBed    12-Jun-2012 18:01  334K  
      bigWigInfo              12-Jun-2012 18:01  251K  
      bigWigSummary           12-Jun-2012 18:01  251K  
      bigWigToBedGraph        12-Jun-2012 18:01  251K  
      bigWigToWig             12-Jun-2012 18:01  251K  
      blat/                   12-Jun-2012 18:01    -   
      faCount                 12-Jun-2012 18:01  163K  
      faFrag                  12-Jun-2012 18:01  160K  
      faOneRecord             12-Jun-2012 18:01  137K  
      faPolyASizes            12-Jun-2012 18:01  159K  
      faRandomize             12-Jun-2012 18:01  160K  
      faSize                  12-Jun-2012 18:01  163K  
      faSomeRecords           12-Jun-2012 18:01  142K  
      faToNib                 12-Jun-2012 18:01  166K  
      faToTwoBit              12-Jun-2012 18:01  256K  
      fetchChromSizes         12-Jun-2012 18:01  2.6K  
      genePredToGtf           12-Jun-2012 18:01  2.7M  
      gff3ToGenePred          12-Jun-2012 18:01  2.7M  
      gtfToGenePred           12-Jun-2012 18:01  2.7M  
      hgWiggle                12-Jun-2012 18:01  2.7M  
      htmlCheck               12-Jun-2012 18:01  235K  
      hubCheck                12-Jun-2012 18:01  2.7M  
      liftOver                12-Jun-2012 18:01  2.7M  
      liftOverMerge           12-Jun-2012 18:01  228K  
      liftUp                  12-Jun-2012 18:01  2.7M  
      mafSpeciesSubset        12-Jun-2012 18:01  159K  
      mafsInRegion            12-Jun-2012 18:01  236K  
      makeTableList           12-Jun-2012 18:01  2.7M  
      nibFrag                 12-Jun-2012 18:01  167K  
      overlapSelect           12-Jun-2012 18:01  2.7M  
      paraFetch               12-Jun-2012 18:01  210K  
      paraSync                12-Jun-2012 18:01  210K  
      pslCDnaFilter           12-Jun-2012 18:01  232K  
      pslPretty               12-Jun-2012 18:01  1.2M  
      pslReps                 12-Jun-2012 18:01  803K  
      pslSort                 12-Jun-2012 18:01  804K  
      sizeof                  12-Jun-2012 18:01  5.3K  
      stringify               12-Jun-2012 18:01  142K  
      textHistogram           12-Jun-2012 18:01  149K  
      twoBitInfo              12-Jun-2012 18:01  248K  
      twoBitToFa              12-Jun-2012 18:01  330K  
      validateFiles           12-Jun-2012 18:01  2.7M  
      wigCorrelate            12-Jun-2012 18:01  267K  
      wigToBigWig             12-Jun-2012 18:01  1.0M

================================================================
========   bedClip   ====================================
================================================================
bedClip - Remove lines from bed file that refer to off-chromosome places.
usage:
   bedClip input.bed chrom.sizes output.bed
options:
   -verbose=2 - set to get list of lines clipped and why

================================================================
========   bedExtendRanges   ====================================
================================================================
bedExtendRanges - extend length of entries in bed 6+ data to be at least the given length,
taking strand directionality into account.

usage:
   bedExtendRanges database length files(s)

options:
   -host	mysql host
   -user	mysql user
   -password	mysql password
   -tab		Separate by tabs rather than space
   -verbose=N - verbose level for extra information to STDERR

example:

   bedExtendRanges hg18 250 stdin

   bedExtendRanges -user=genome -host=genome-mysql.cse.ucsc.edu hg18 250 stdin

will transform:
    chr1 500 525 . 100 +
    chr1 1000 1025 . 100 -
to:
    chr1 500 750 . 100 +
    chr1 775 1025 . 100 -

================================================================
========   bedGraphToBigWig   ====================================
================================================================
bedGraphToBigWig v 4 - Convert a bedGraph program to bigWig.
usage:
   bedGraphToBigWig in.bedGraph chrom.sizes out.bw
where in.bedGraph is a four column file in the format:
      <chrom> <start> <end> <value>
and chrom.sizes is two column: <chromosome name> <size in bases>
and out.bw is the output indexed big wig file.
Use the script: fetchChromSizes to obtain the actual chrom.sizes information
from UCSC, please do not make up a chrom sizes from your own information.
The input bedGraph file must be sorted, use the unix sort command:
  sort -k1,1 -k2,2n unsorted.bedGraph > sorted.bedGraph
options:
   -blockSize=N - Number of items to bundle in r-tree.  Default 256
   -itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024
   -unc - If set, do not use compression.
================================================================
========   bedItemOverlapCount   ====================================
================================================================
bedItemOverlapCount - count number of times a base is overlapped by the
	items in a bed file.  Output is bedGraph 4 to stdout.
usage:
 sort bedFile.bed | bedItemOverlapCount [options] <database> stdin
To create a bigWig file from this data to use in a custom track:
 sort -k1,1 bedFile.bed | bedItemOverlapCount [options] <database> stdin \
         > bedFile.bedGraph
 bedGraphToBigWig bedFile.bedGraph chrom.sizes bedFile.bw
   where the chrom.sizes is obtained with the script: fetchChromSizes
   See also:

http://genome-test.cse.ucsc.edu/~kent/src/unzipped/utils/userApps/fetchChromSizes

options:
   -zero      add blocks with zero count, normally these are ommitted
   -bed12     expect bed12 and count based on blocks
              Without this option, only the first three fields are used.
   -max       if counts per base overflows set to max (4294967295) instead of exiting
   -outBounds output min/max to stderr
   -chromSize=sizefile	Read chrom sizes from file instead of database
             sizefile contains two white space separated fields per line:
		chrom name and size
   -host=hostname	mysql host used to get chrom sizes
   -user=username	mysql user
   -password=password	mysql password

Notes:
 * You may want to separate your + and - strand
   items before sending into this program as it only looks at
   the chrom, start and end columns of the bed file.
 * Program requires a <database> connection to lookup chrom sizes for a sanity
   check of the incoming data.  Even when the -chromSize argument is used
   the <database> must be present, but it will not be used.

 * The bed file *must* be sorted by chrom
 * Maximum count per base is 4294967295. Recompile with new unitSize to increase this
================================================================
========   bedSort   ====================================
================================================================
bedSort - Sort a .bed file by chrom,chromStart
usage:
   bedSort in.bed out.bed
in.bed and out.bed may be the same.
================================================================
========   bedToBigBed   ====================================
================================================================
bedToBigBed v. 2.0 - Convert bed file to bigBed. (BigBed version: 4)
usage:
   bedToBigBed in.bed chrom.sizes out.bb
Where in.bed is in one of the ascii bed formats, but not including track lines
and chrom.sizes is two column: <chromosome name> <size in bases>
and out.bb is the output indexed big bed file.
Use the script: fetchChromSizes to obtain the actual chrom.sizes information
from UCSC, please do not make up a chrom sizes from your own information.
The in.bed file must be sorted by chromosome,start,
  to sort a bed file, use the unix sort command:
     sort -k1,1 -k2,2n unsorted.bed > sorted.bed

options:
   -type=bedN[+[P]] : 
                      N is between 3 and 15, 
                      optional (+) if extra "bedPlus" fields, 
                      optional P specifies the number of extra fields. Not required, but preferred.
                      Examples: -type=bed6 or -type=bed6+ or -type=bed6+3 
                      (see http://genome.ucsc.edu/FAQ/FAQformat.html#format1)
   -as=fields.as - If you have non-standard "bedPlus" fields, it's great to put a definition
                   of each field in a row in AutoSql format here.
   -blockSize=N - Number of items to bundle in r-tree.  Default 256
   -itemsPerSlot=N - Number of data points bundled at lowest level. Default 512
   -unc - If set, do not use compression.
   -tab - If set, expect fields to be tab separated, normally
           expects white space separator.

================================================================
========   bigBedInfo   ====================================
================================================================
bigBedInfo - Show information about a bigBed file.
usage:
   bigBedInfo file.bb
options:
   -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs
   -chroms - list all chromosomes and their sizes
   -zooms - list all zoom levels and theier sizes
   -as - get autoSql spec

================================================================
========   bigBedSummary   ====================================
================================================================
bigBedSummary - Extract summary information from a bigBed file.
usage:
   bigBedSummary file.bb chrom start end dataPoints
Get summary data from bigBed for indicated region, broken into
dataPoints equal parts.  (Use dataPoints=1 for simple summary.)
options:
   -type=X where X is one of:
         coverage - % of region that is covered (default)
         mean - average depth of covered regions
         min - minimum depth of covered regions
         max - maximum depth of covered regions
   -fields - print out information on fields in file.
      If fields option is used, the chrom, start, end, dataPoints
      parameters may be omitted
   -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs

================================================================
========   bigBedToBed   ====================================
================================================================
bigBedToBed - Convert from bigBed to ascii bed format.
usage:
   bigBedToBed input.bb output.bed
options:
   -chrom=chr1 - if set restrict output to given chromosome
   -start=N - if set, restrict output to only that over start
   -end=N - if set, restict output to only that under end
   -maxItems=N - if set, restrict output to first N items
   -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs

================================================================
========   bigWigAverageOverBed   ====================================
================================================================
bigWigAverageOverBed - Compute average score of big wig over each bed, which may have introns.
usage:
   bigWigAverageOverBed in.bw in.bed out.tab
The output columns are:
   name - name field from bed, which should be unique
   size - size of bed (sum of exon sizes
   covered - # bases within exons covered by bigWig
   sum - sum of values over all bases covered
   mean0 - average over bases with non-covered bases counting as zeroes
   mean - average over just covered bases
Options:
   -bedOut=out.bed - Make output bed that is echo of input bed but with mean column appended
   -sampleAroundCenter=N - Take sample at region N bases wide centered around bed item, rather
                     than the usual sample in the bed item.

================================================================
========   bigWigInfo   ====================================
================================================================
bigWigInfo - Print out information about bigWig file.
usage:
   bigWigInfo file.bw
options:
   -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs
   -chroms - list all chromosomes and their sizes
   -zooms - list all zoom levels and their sizes
   -minMax - list the min and max on a single line

================================================================
========   bigWigSummary   ====================================
================================================================
bigWigSummary - Extract summary information from a bigWig file.
usage:
   bigWigSummary file.bigWig chrom start end dataPoints
Get summary data from bigWig for indicated region, broken into
dataPoints equal parts.  (Use dataPoints=1 for simple summary.)

NOTE:  start and end coordinates are in BED format (0-based)

options:
   -type=X where X is one of:
         mean - average value in region (default)
         min - minimum value in region
         max - maximum value in region
         std - standard deviation in region
         coverage - % of region that is covered
   -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs

================================================================
========   bigWigToBedGraph   ====================================
================================================================
bigWigToBedGraph - Convert from bigWig to bedGraph format.
usage:
   bigWigToBedGraph in.bigWig out.bedGraph
options:
   -chrom=chr1 - if set restrict output to given chromosome
   -start=N - if set, restrict output to only that over start
   -end=N - if set, restict output to only that under end
   -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs

================================================================
========   bigWigToWig   ====================================
================================================================
bigWigToWig - Convert bigWig to wig.  This will keep more of the same structure of the
original wig than bigWigToBedGraph does, but still will break up large stepped sections
into smaller ones.
usage:
   bigWigToWig in.bigWig out.wig
options:
   -chrom=chr1 - if set restrict output to given chromosome
   -start=N - if set, restrict output to only that over start
   -end=N - if set, restict output to only that under end
   -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs

================================================================
========   blat   ====================================
================================================================
blat - Standalone BLAT v. 34x12 fast sequence search command line tool
usage:
   blat database query [-ooc=11.ooc] output.psl
where:
   database and query are each either a .fa , .nib or .2bit file,
   or a list these files one file name per line.
   -ooc=11.ooc tells the program to load over-occurring 11-mers from
               and external file.  This will increase the speed
               by a factor of 40 in many cases, but is not required
   output.psl is where to put the output.
   Subranges of nib and .2bit files may specified using the syntax:
      /path/file.nib:seqid:start-end
   or
      /path/file.2bit:seqid:start-end
   or
      /path/file.nib:start-end
   With the second form, a sequence id of file:start-end will be used.
options:
   -t=type     Database type.  Type is one of:
                 dna - DNA sequence
                 prot - protein sequence
                 dnax - DNA sequence translated in six frames to protein
               The default is dna
   -q=type     Query type.  Type is one of:
                 dna - DNA sequence
                 rna - RNA sequence
                 prot - protein sequence
                 dnax - DNA sequence translated in six frames to protein
                 rnax - DNA sequence translated in three frames to protein
               The default is dna
   -prot       Synonymous with -t=prot -q=prot
   -ooc=N.ooc  Use overused tile file N.ooc.  N should correspond to 
               the tileSize
   -tileSize=N sets the size of match that triggers an alignment.  
               Usually between 8 and 12
               Default is 11 for DNA and 5 for protein.
   -stepSize=N spacing between tiles. Default is tileSize.
   -oneOff=N   If set to 1 this allows one mismatch in tile and still
               triggers an alignments.  Default is 0.
   -minMatch=N sets the number of tile matches.  Usually set from 2 to 4
               Default is 2 for nucleotide, 1 for protein.
   -minScore=N sets minimum score.  This is the matches minus the 
               mismatches minus some sort of gap penalty.  Default is 30
   -minIdentity=N Sets minimum sequence identity (in percent).  Default is
               90 for nucleotide searches, 25 for protein or translated
               protein searches.
   -maxGap=N   sets the size of maximum gap between tiles in a clump.  Usually
               set from 0 to 3.  Default is 2. Only relevent for minMatch > 1.
   -noHead     suppress .psl header (so it's just a tab-separated file)
   -makeOoc=N.ooc Make overused tile file. Target needs to be complete genome.
   -repMatch=N sets the number of repetitions of a tile allowed before
               it is marked as overused.  Typically this is 256 for tileSize
               12, 1024 for tile size 11, 4096 for tile size 10.
               Default is 1024.  Typically only comes into play with makeOoc.
               Also affected by stepSize. When stepSize is halved repMatch is
               doubled to compensate.
   -mask=type  Mask out repeats.  Alignments won't be started in masked region
               but may extend through it in nucleotide searches.  Masked areas
               are ignored entirely in protein or translated searches. Types are
                 lower - mask out lower cased sequence
                 upper - mask out upper cased sequence
                 out   - mask according to database.out RepeatMasker .out file
                 file.out - mask database according to RepeatMasker file.out
   -qMask=type Mask out repeats in query sequence.  Similar to -mask above but
               for query rather than target sequence.
   -repeats=type Type is same as mask types above.  Repeat bases will not be
               masked in any way, but matches in repeat areas will be reported
               separately from matches in other areas in the psl output.
   -minRepDivergence=NN - minimum percent divergence of repeats to allow 
               them to be unmasked.  Default is 15.  Only relevant for 
               masking using RepeatMasker .out files.
   -dots=N     Output dot every N sequences to show program's progress
   -trimT      Trim leading poly-T
   -noTrimA    Don't trim trailing poly-A
   -trimHardA  Remove poly-A tail from qSize as well as alignments in 
               psl output
   -fastMap    Run for fast DNA/DNA remapping - not allowing introns, 
               requiring high %ID. Query sizes must not exceed 5000.
   -out=type   Controls output file format.  Type is one of:
                   psl - Default.  Tab separated format, no sequence
                   pslx - Tab separated format with sequence
                   axt - blastz-associated axt format
                   maf - multiz-associated maf format
                   sim4 - similar to sim4 format
                   wublast - similar to wublast format
                   blast - similar to NCBI blast format
                   blast8- NCBI blast tabular format
                   blast9 - NCBI blast tabular format with comments
   -fine       For high quality mRNAs look harder for small initial and
               terminal exons.  Not recommended for ESTs
   -maxIntron=N  Sets maximum intron size. Default is 750000
   -extendThroughN - Allows extension of alignment through large blocks of N's
================================================================
========   faCount   ====================================
================================================================
faCount - count base statistics and CpGs in FA files.
usage:
   faCount file(s).fa
     -summary  show only summary statistics
     -dinuc    include statistics on dinucletoide frequencies
     -strands  count bases on both strands

================================================================
========   faFrag   ====================================
================================================================
faFrag - Extract a piece of DNA from a .fa file.
usage:
   faFrag in.fa start end out.fa
options:
   -mixed - preserve mixed-case in FASTA file

================================================================
========   faOneRecord   ====================================
================================================================
faOneRecord - Extract a single record from a .FA file
usage:
   faOneRecord in.fa recordName

================================================================
========   faPolyASizes   ====================================
================================================================
faPolyASizes - get poly A sizes
usage:
   faPolyASizes in.fa out.tab

output file has four columns:
   id seqSize tailPolyASize headPolyTSize

options:

================================================================
========   faRandomize   ====================================
================================================================
faRandomize - Program to create random fasta records using
same base frequency as seen in original fasta records.
Use optional -seed flag to specify seed for random number
generator.
usage:
   faRandomize in.fa randomized.fa

================================================================
========   faSize   ====================================
================================================================
faSize - print total base count in fa files.
usage:
   faSize file(s).fa
Command flags
   -detailed        outputs name and size of each record
                    has the side effect of printing nothing else
   -tab             output statistics in a tab separated format

================================================================
========   faSomeRecords   ====================================
================================================================
faSomeRecords - Extract multiple fa records
usage:
   faSomeRecords in.fa listFile out.fa
options:
   -exclude - output sequences not in the list file.

================================================================
========   faToNib   ====================================
================================================================
faToNib - Convert from .fa to .nib format
usage:
   faToNib [options] in.fa out.nib
options:
   -softMask - create nib that soft-masks lower case sequence
   -hardMask - create nib that hard-masks lower case sequence

================================================================
========   faToTwoBit   ====================================
================================================================
faToTwoBit - Convert DNA from fasta to 2bit format
usage:
   faToTwoBit in.fa [in2.fa in3.fa ...] out.2bit
options:
   -noMask       - Ignore lower-case masking in fa file.
   -stripVersion - Strip off version number after . for genbank accessions.
   -ignoreDups   - only convert first sequence if there are duplicates

================================================================
========   fetchChromSizes   ====================================
================================================================
usage: fetchChromSizes <db> > <db>.chrom.sizes
   used to fetch chrom.sizes information from UCSC for the given <db>
<db> - name of UCSC database, e.g.: hg18, mm9, etc ...

This script expects to find one of the following commands:
   wget, mysql, or ftp in order to fetch information from UCSC.
Route the output to the file <db>.chrom.sizes as indicated above.

Example:   fetchChromSizes hg18 > hg18.chrom.sizes
================================================================
========   genePredToGtf   ====================================
================================================================
genePredToGtf - Convert genePred table or file to gtf.
usage:
   genePredToGtf database genePredTable output.gtf
If database is 'file' then track is interpreted as a file
rather than a table in database.
options:
   -utr - Add 5UTR and 3UTR features
   -honorCdsStat - use cdsStartStat/cdsEndStat when defining start/end
    codon records
   -source=src set source name to uses
   -addComments - Add comments before each set of transcript records.
    allows for easier visual inspection
Note: use a refFlat table or extended genePred table or file to include
the gene_name attribute in the output.  This will not work with a refFlat
table dump file. If you are using a genePred file that starts with a numeric
bin column, drop it using the UNIX cut command:
    cut -f 2- in.gp | genePredToGtf file stdin out.gp

================================================================
========   gfClient   ====================================
================================================================
gfClient v. 34x12 - A client for the genomic finding program that produces a .psl file
usage:
   gfClient host port seqDir in.fa out.psl
where
   host is the name of the machine running the gfServer
   port is the same as you started the gfServer with
   seqDir is the path of the .nib or .2bit files relative to the current dir
       (note these are needed by the client as well as the server)
   in.fa is a fasta format file.  May contain multiple records
   out.psl where to put the output
options:
   -t=type     Database type.  Type is one of:
                 dna - DNA sequence
                 prot - protein sequence
                 dnax - DNA sequence translated in six frames to protein
               The default is dna
   -q=type     Query type.  Type is one of:
                 dna - DNA sequence
                 rna - RNA sequence
                 prot - protein sequence
                 dnax - DNA sequence translated in six frames to protein
                 rnax - DNA sequence translated in three frames to protein
   -prot       Synonymous with -d=prot -q=prot
   -dots=N   Output a dot every N query sequences
   -nohead   Suppresses psl five line header
   -minScore=N sets minimum score.  This is twice the matches minus the 
               mismatches minus some sort of gap penalty.  Default is 30
   -minIdentity=N Sets minimum sequence identity (in percent).  Default is
               90 for nucleotide searches, 25 for protein or translated
               protein searches.
   -out=type   Controls output file format.  Type is one of:
                   psl - Default.  Tab separated format without actual sequence
                   pslx - Tab separated format with sequence
                   axt - blastz-associated axt format
                   maf - multiz-associated maf format
                   sim4 - similar to sim4 format
                   wublast - similar to wublast format
                   blast - similar to NCBI blast format
                   blast8- NCBI blast tabular format
                   blast9 - NCBI blast tabular format with comments
   -maxIntron=N  Sets maximum intron size. Default is 750000
================================================================
========   gfServer   ====================================
================================================================
gfServer v 34x12 - Make a server to quickly find where DNA occurs in genome.
To set up a server:
   gfServer start host port file(s)
   Where the files are in .nib or .2bit format
To remove a server:
   gfServer stop host port
To query a server with DNA sequence:
   gfServer query host port probe.fa
To query a server with protein sequence:
   gfServer protQuery host port probe.fa
To query a server with translated dna sequence:
   gfServer transQuery host port probe.fa
To query server with PCR primers
   gfServer pcr host port fPrimer rPrimer maxDistance
To process one probe fa file against a .nib format genome (not starting server):
   gfServer direct probe.fa file(s).nib
To test pcr without starting server:
   gfServer pcrDirect fPrimer rPrimer file(s).nib
To figure out usage level
   gfServer status host port
To get input file list
   gfServer files host port
Options:
   -tileSize=N size of n-mers to index.  Default is 11 for nucleotides, 4 for
               proteins (or translated nucleotides).
   -stepSize=N spacing between tiles. Default is tileSize.
   -minMatch=N Number of n-mer matches that trigger detailed alignment
               Default is 2 for nucleotides, 3 for protiens.
   -maxGap=N   Number of insertions or deletions allowed between n-mers.
               Default is 2 for nucleotides, 0 for protiens.
   -trans  Translate database to protein in 6 frames.  Note: it is best
           to run this on RepeatMasked data in this case.
   -log=logFile keep a log file that records server requests.
   -seqLog    Include sequences in log file (not logged with -syslog)
   -ipLog     Include user's IP in log file (not logged with -syslog)
   -syslog    Log to syslog
   -logFacility=facility log to the specified syslog facility - default local0.
   -mask      Use masking from nib file.
   -repMatch=N Number of occurrences of a tile (nmer) that trigger repeat masking the tile.
               Default is 1024.
   -maxDnaHits=N Maximum number of hits for a dna query that are sent from the server.
               Default is 100.
   -maxTransHits=N Maximum number of hits for a translated query that are sent from the server.
               Default is 200.
   -maxNtSize=N Maximum size of untranslated DNA query sequence
               Default is 40000
   -maxAaSize=N Maximum size of protein or translated DNA queries
               Default is 8000
   -canStop If set then a quit message will actually take down the
            server

================================================================
========   gff3ToGenePred   ====================================
================================================================
gff3ToGenePred - convert a GFF3 file to a genePred file
usage:
   gff3ToGenePred inGff3 outGp
options:
  -maxParseErrors=50 - Maximum number of parsing errors before aborting. A negative
   value will allow an unlimited number of errors.  Default is 50.
  -maxConverErrors=50 - Maximum number of conversion errors before aborting. A negative
   value will allow an unlimited number of errors.  Default is 50.
  -honorStartStopCodons - only set CDS start/stop status to complete if there are
   corresponding start_stop codon records
This converts:
   - top-level gene records with mRNA records
   - top-level mRNA records
   - mRNA records that contain:
       - exon and CDS
       - CDS, five_prime_UTR, three_prime_UTR
       - only exon for non-coding
The first step is to parse GFF3 file, up to 50 errors are reported before
aborting.  If the GFF3 files is successfully parse, it is converted to gene,
annotation.  Up to 50 conversion errors are reported before aborting.

Input file must conform to the GFF3 specification:

http://www.sequenceontology.org/gff3.shtml

================================================================
========   gtfToGenePred   ====================================
================================================================
gtfToGenePred - convert a GTF file to a genePred
usage:
   gtfToGenePred gtf genePred

options:
     -genePredExt - create a extended genePred, including frame
      information and gene name
     -allErrors - skip groups with errors rather than aborting.
      Useful for getting infomation about as many errors as possible.
     -infoOut=file - write a file with information on each transcript
     -sourcePrefix=pre - only process entries where the source name has the
      specified prefix.  May be repeated.
     -impliedStopAfterCds - implied stop codon in after CDS
     -simple    - just check column validity, not hierarchy, resulting genePred may be damaged
     -geneNameAsName2 - if specified, use gene_name for the name2 field
      instead of gene_id.

================================================================
========   hgWiggle   ====================================
================================================================
hgWiggle - fetch wiggle data from data base or file
usage:
   hgWiggle [options] <track names ...>
options:
   -db=<database> - use specified database
   -chr=chrN - examine data only on chrN
   -chrom=chrN - same as -chr option above
   -position=[chrN:]start-end - examine data in window start-end (1-relative)
             (the chrN: is optional)
   -chromLst=<file> - file with list of chroms to examine
   -doAscii - perform the default ascii output, in addition to other outputs
            - Any of the other -do outputs turn off the default ascii output
            - ***WARNING*** this ascii output is 0-relative offset which
            - *** is *not* the normal wiggle input format.  Use the -lift
            - *** argument -lift=1 to get 1-relative offset:
   -lift=<D> - lift ascii output positions by D (0 default)
   -rawDataOut - output just the data values, nothing else
   -htmlOut - output stats or histogram in HTML instead of plain text
   -doStats - perform stats measurement, default output text, see -htmlOut
   -doBed - output bed format
   -bedFile=<file> - constrain output to ranges specified in bed <file>
   -dataConstraint='DC' - where DC is one of < = >= <= == != 'in range'
   -ll=<F> - lowerLimit compare data values to F (float) (all but 'in range')
   -ul=<F> - upperLimit compare data values to F (float)
		(need both ll and ul when 'in range')

   -help - display more examples and extra options (to stderr)

   When no database is specified, track names will refer to .wig files

   example using the file chrM.wig:
	hgWiggle chrM
   example using the database table hg17.gc5Base:
	hgWiggle -chr=chrM -db=hg17 gc5Base
================================================================
========   htmlCheck   ====================================
================================================================
htmlCheck - Do a little reading and verification of html file
usage:
   htmlCheck how url
where how is:
   ok - just check for 200 return.  Print error message and exit -1 if no 200
   getAll - read the url (header and html) and print to stdout
   getHeader - read the header and print to stdout
   getCookies - print list of cookies
   getHtml - print the html, but not the header to stdout
   getForms - print the form structure to stdout
   getVars - print the form variables to stdout
   getLinks - print links
   getTags - print out just the tags
   checkLinks - check links in page
   checkLinks2 - check links in page and all subpages in same host
             (Just one level of recursion)
   checkLocalLinks - check local links in page
   checkLocalLinks2 - check local links in page and connected local pages
             (Just one level of recursion)
   submit - submit first form in page if any using 'GET' method
   validate - do some basic validations including TABLE/TR/TD nesting
options:
   cookies=cookie.txt - Cookies is a two column file
           containing <cookieName><space><value><newLine>
note: url will need to be in quotes if it contains an ampersand.
================================================================
========   hubCheck   ====================================
================================================================
hubCheck - Check a track data hub for integrity.
usage:
   hubCheck http://yourHost/yourDir/hub.txt
options:
   -udcDir=/dir/to/cache - place to put cache for remote bigBed/bigWigs.
                           Will create this directory if not existing
   -verbose=2            - output verbosely
   -clear=browserMachine - clear hub status, no checking
   -noTracks             - don't check each track, just trackDb

================================================================
========   liftOver   ====================================
================================================================
liftOver - Move annotations from one assembly to another
usage:
   liftOver oldFile map.chain newFile unMapped
oldFile and newFile are in bed format by default, but can be in GFF and
maybe eventually others with the appropriate flags below.
The map.chain file has the old genome as the target and the new genome
as the query.

***********************************************************************
WARNING: liftOver was only designed to work between different
         assemblies of the same organism. It may not do what you want
         if you are lifting between different organisms. If there has
         been a rearrangement in one of the species, the size of the
         region being mapped may change dramatically after mapping.
***********************************************************************

options:
   -minMatch=0.N Minimum ratio of bases that must remap. Default 0.95
   -gff  File is in gff/gtf format.  Note that the gff lines are converted
         separately.  It would be good to have a separate check after this
         that the lines that make up a gene model still make a plausible gene
         after liftOver
   -genePred - File is in genePred format
   -sample - File is in sample format
   -bedPlus=N - File is bed N+ format
   -positions - File is in browser "position" format
   -hasBin - File has bin value (used only with -bedPlus)
   -tab - Separate by tabs rather than space (used only with -bedPlus)
   -pslT - File is in psl format, map target side only
   -minBlocks=0.N Minimum ratio of alignment blocks or exons that must map
                  (default 1.00)
   -fudgeThick    (bed 12 or 12+ only) If thickStart/thickEnd is not mapped,
                  use the closest mapped base.  Recommended if using 
                  -minBlocks.
   -multiple               Allow multiple output regions
   -minChainT, -minChainQ  Minimum chain size in target/query, when mapping
                           to multiple output regions (default 0, 0)
   -minSizeT               deprecated synonym for -minChainT (ENCODE compat.)
   -minSizeQ               Min matching region size in query with -multiple.
   -chainTable             Used with -multiple, format is db.tablename,
                               to extend chains from net (preserves dups)
   -errorHelp              Explain error messages

================================================================
========   liftOverMerge   ====================================
================================================================
liftOverMerge - Merge multiple regions in BED 5 files
                   generated by liftOver -multiple
usage:
   liftOverMerge oldFile newFile
options:
   -mergeGap=N    Max size of gap to merge regions (default 0)

================================================================
========   liftUp   ====================================
================================================================
liftUp - change coordinates of .psl, .agp, .gap, .gl, .out, .gff, .gtf .bscore 
.tab .gdup .axt .chain .net, genePred, .wab, .bed, or .bed8 files to parent
coordinate system.

usage:
   liftUp [-type=.xxx] destFile liftSpec how sourceFile(s)
The optional -type parameter tells what type of files to lift
If omitted the type is inferred from the suffix of destFile
Type is one of the suffixes described above.
DestFile will contain the merged and lifted source files,
with the coordinates translated as per liftSpec.  LiftSpec
is tab-delimited with each line of the form:
   offset oldName oldSize newName newSize
LiftSpec may optionally have a sixth column specifying + or - strand,
but strand is not supported for all input types.
The 'how' parameter controls what the program will do with
items which are not in the liftSpec.  It must be one of:
   carry - Items not in liftSpec are carried to dest without translation
   drop  - Items not in liftSpec are silently dropped from dest
   warn  - Items not in liftSpec are dropped.  A warning is issued
   error - Items not in liftSpec generate an error
If the destination is a .agp file then a 'large inserts' file
also needs to be included in the command line:
   liftUp dest.agp liftSpec how inserts sourceFile(s)
This file describes where large inserts due to heterochromitin
should be added. Use /dev/null and set -gapsize if there's not inserts file.

options:
   -nohead  No header written for .psl files
   -dots=N Output a dot every N lines processed
   -pslQ  Lift query (rather than target) side of psl
   -axtQ  Lift query (rather than target) side of axt
   -chainQ  Lift query (rather than target) side of chain
   -netQ  Lift query (rather than target) side of net
   -wabaQ  Lift query (rather than target) side of waba alignment
   	(waba lifts only work with query side at this time)
   -nosort Don't sort bed, gff, or gdup files, to save memory
   -gapsize change contig gapsize from default
   -ignoreVersions - Ignore NCBI-style version number in sequence ids of input files
   -extGenePred lift extended genePred

================================================================
========   mafSpeciesSubset   ====================================
================================================================
mafSpeciesSubset - Extract a maf that just has a subset of species.
usage:
   mafSpeciesSubset in.maf species.lst out.maf
Where:
    in.maf is a file where the sequence source are either simple species
           names, or species.something.  Usually actually it's a genome
           database name rather than a species before the dot to tell the
           truth.
    species.lst is a file with a list of species to keep
    out.maf is the output.  It will have columns that are all - or . in
           the reduced species set removed, as well as the lines representing
           species not in species.lst removed.
options:
   -keepFirst - If set, keep the first 'a' line in a maf no matter what
                Useful for mafFrag results where we use this for the gene name

================================================================
========   mafsInRegion   ====================================
================================================================
mafsInRegion - Extract MAFS in a genomic region
usage:
    mafsInRegion regions.bed out.maf|outDir in.maf(s)
options:
    -outDir - output separate files named by bed name field to outDir
    -keepInitialGaps - keep alignment columns at the beginning and of a block that are gapped in all species

================================================================
========   makeTableList   ====================================
================================================================
makeTableList - create/recreate tableList tables (cache of SHOW TABLES)
usage:
   makeTableList [assemblies]
options:
   -all               recreate tableList for all assemblies
================================================================
========   nibFrag   ====================================
================================================================
nibFrag - Extract part of a nib file as .fa (all bases/gaps lower case by default)
usage:
   nibFrag [options] file.nib start end strand out.fa
where strand is + (plus) or m (minus)
options:
   -masked - use lower case characters for bases meant to be masked out
   -hardMasked - use upper case for not masked-out and 'N' characters for masked-out bases
   -upper - use upper case characters for all bases
   -name=name Use given name after '>' in output sequence
   -dbHeader=db Add full database info to the header, with or without -name option
   -tbaHeader=db Format header for compatibility with tba, takes database name as argument

================================================================
========   overlapSelect   ====================================
================================================================
wrong # args:  overlapSelect [options] selectFile inFile outFile

Select records based on overlapping chromosome ranges.  The ranges are
specified in the selectFile, with each block specifying a range.
Records are copied from the inFile to outFile based on the selection
criteria.  Selection is based on blocks or exons rather than entire
range.

Options starting with -select* apply to selectFile and those starting
with -in* apply to inFile.

Options:
  -selectFmt=fmt - specify selectFile format:
          psl - PSL format (default for *.psl files).
          pslq - PSL format, using query instead of target
          genePred - genePred format (default for *.gp or
                     *.genePred files).
          bed - BED format (default for *.bed files).
                If BED doesn't have blocks, the bed range is used. 
          chain - chain file format (default from .chain files)
          chainq - chain file format, using query instead of target
  -selectCoordCols=spec - selectFile is tab-separate with coordinates
       as described by spec, which is one of:
            o chromCol - chrom in this column followed by start and end.
            o chromCol,startCol,endCol,strandCol,name - chrom, start, end, and
              strand in specified columns. Columns can be omitted from the end
              or left empty to not specify.
          NOTE: column numbers are zero-based
  -selectCds - Use only CDS in the selectFile
  -selectRange - Use entire range instead of blocks from records in
          the selectFile.
  -inFmt=fmt - specify inFile format, same values as -selectFmt.
  -inCoordCols=spec - inFile is tab-separate with coordinates specified by
      spec, in format described above.
  -inCds - Use only CDS in the inFile
  -inRange - Use entire range instead of blocks of records in the inFile.
  -nonOverlapping - select non-overlapping instead of overlapping records
  -strand - must be on the same strand to be considered overlapping
  -oppositeStrand - must be on the opposite strand to be considered overlapping
  -excludeSelf - don't compare records with the same coordinates and name.
      Warning: using only one of -inCds or -selectCds will result in different
      coordinates for the same record.
  -idMatch - only select overlapping records if they have the same id
  -aggregate - instead of computing overlap bases on individual select entries, 
      compute it based on the total number of inFile bases overlap by selectFile
      records. -overlapSimilarity and -mergeOutput will not work with
      this option.
  -overlapThreshold=0.0 - minimum fraction of an inFile record that
      must be overlapped by a single select record to be considered
      overlapping.  Note that this is only coverage by a single select
      record, not total coverage.
  -overlapThresholdCeil=1.1 - select only inFile records with less than
      this amount of overlap with a single record, provided they are selected
      by other criteria.
  -overlapSimilarity=0.0 - minimum fraction of inFile and select records that
      Note that this is only coverage by a single select record and this
      is; bidirectional inFile and selectFile must overlap by this
      amount.  A value of 1.0 will select identical records (or CDS if
      both CDS options are specified.  Not currently supported with
      -aggregate.
  -overlapSimilarityCeil=1.1 - select only inFile records with less than this
      amount of similarity with a single record. provided they are selected by
      other criteria.
  -overlapBases=-1 - minimum number of bases of overlap, < 0 disables.
  -statsOutput - output overlap statistics instead of selected records. 
      If no overlap criteria is specified, all overlapping entries are
      reported, Otherwise only the pairs passing the criteria are
      reported. This results in a tab-separated file with the columns:
         inId selectId inOverlap selectOverlap overBases
      Where inOverlap is the fraction of the inFile record overlapped by
      the selectFile record and selectOverlap is the fraction of the
      select record overlap by inFile records.  With -aggregate, output
      is:
         inId inOverlap inOverBases inBases
  -statsOutputAll - like -statsOutput, however output all inFile records,
      including those that are not overlapped.
  -statsOutputBoth - like -statsOutput, however output all selectFile and
      inFile records, including those that are not overlapped.
  -mergeOutput - output file with be a merge of the input file with the
      selectFile records that selected it.  The format is
         inRec<tab>selectRec.
      if multiple select records hit, inRec is repeated. This will increase
      the memory required. Not supported with -nonOverlapping or -aggregate.
  -idOutput - output a tab-separated file of pairs of
         inId selectId
      with -aggregate, only a single column of inId is written
  -dropped=file  - output rows that were dropped to this file.
  -verbose=n - verbose > 1 prints some details,

================================================================
========   paraFetch   ====================================
================================================================
paraFetch - try to fetch url with multiple connections
usage:
   paraFetch N R URL {outPath}
   where N is the number of connections to use
         R is the number of retries
   outPath is optional. If not specified, it will attempt to parse URL to discover output filename.
options:
   -newer  only download a file if it is newer than the version we already have.
   -progress  Show progress of download.

================================================================
========   paraSync   ====================================
================================================================
paraSync 1.0
paraSync - uses paraFetch to recursively mirror url to given path
usage:
   paraSync {options} N R URL outPath
   where N is the number of connections to use
         R is the number of retries
options:
   -A='ext1,ext2'  means accept only files with ext1 or ext2
   -newer  only download a file if it is newer than the version we already have.
   -progress  Show progress of download.

================================================================
========   pslCDnaFilter   ====================================
================================================================
wrong # of args:  pslCDnaFilter [options] inPsl outPsl

Filter cDNA alignments in psl format.  Filtering criteria are
comparative, selecting near best in genome alignments for each
given cDNA and non-comparative, based only on the quality of an
individual alignment.

WARNING: comparative filters requires that the input is sorted by
query name.  The command: 'sort -k 10,10' will do the trick.

Each alignment is assigned a score that is based on identity and
weighted towards longer alignments and those with introns.  This
can do either global or local best-in-genome selection.  Local
near best in genome keeps fragments of an mRNA that align in
discontinuous locations from other fragments.  It is useful for
unfinished genomes.  Global near best in genome keeps alignments
based on overall score.

Options:
   -algoHelp - print message describing the filtering algorithm.

   -localNearBest=-1.0 - local near best in genome filtering,
    keeping aligments within this fraction of the top score for
    each aligned portion of the mRNA. A value of zero keeps only
    the best for each fragment. A value of -1.0 disables
    (default).

   -globalNearBest=-1.0 - global near best in genome filtering,
    keeping aligments withing this fraction of the top score.  A
    value of zero keeps only the best alignment.  A value of -1.0
    disables (default).

   -ignoreNs - don't include Ns (repeat masked) while calculating the
    score and coverage. That is treat them as unaligned rather than
    mismatches.  Ns are still counts as mismatches when calculating
    the identity.

   -ignoreIntrons - don't favor apparent introns when scoring.

   -minId=0.0 - only keep alignments with at least this fraction
    identity.

   -minCover=0.0 - minimum fraction of query that must be
    aligned.  If -polyASizes is specified and the query is in
    the file, the ploy-A is not included in coverage
    calculation.

   -decayMinCover  -  the minimum coverage is calculated
    per alignment from the query size using the formula:
       minCoverage = 1.0 - qSize / 250.0
    and minCoverage is bounded between 0.25 and 0.9.

   -minSpan=0.0 - keep only alignments whose target length are
    at least this fraction of the longest alignment passing the
    other filters.  This can be useful for removing possible
    retroposed genes.

   -minQSize=0 - drop queries shorter than this size

   -minAlnSize=0 - minimum number of aligned bases.  This includes
    repeats, but excludes poly-A/poly-T bases if available.

   -minNonRepSize=0 - Minimum number of matching bases that are not repeats.
    This does not include mismatches.
    Must use -repeats on BLAT if doing unmasked alignments.

   -maxRepMatch=1.0 - Maximum fraction of matching bases
    that are repeats.  Must use -repeats on BLAT if doing
    unmasked alignments.

   -maxAligns=-1 - maximum number of alignments for a given query. If
    exceeded, then alignments are sorted by score and only this number
    will be saved.  A value of -1 disables (default)

   -polyASizes=file - tab separate file with information about
    poly-A tails and poly-T heads.  Format is outputted by
    faPolyASizes:

        id seqSize tailPolyASize headPolyTSize

   -usePolyTHead - if a poly-T head was detected and is longer
    than the poly-A tail, it is used when calculating coverage
    instead of the poly-A head.

   -bestOverlap - filter overlapping alignments, keeping the best of
    alignments that are similar.  This is designed to be used with
    overlapping, windowed alignments, where one alignment might be truncated.
    Does not discarding ones with weird overlap unless -filterWeirdOverlapped
    is specified.

   -hapRegions=psl - PSL format alignments of each haplotype pseudo-chromosome
    to the corresponding reference chromosome region.  This is used to map
    alignments between regions.

   -dropped=psl - save psls that were dropped to this file.

   -weirdOverlapped=psl - output weirdly overlapping PSLs to
    this file.

   -filterWeirdOverlapped - Filter weirdly overlapped alignments, keeping
    the single highest scoring one or an arbitrary one if multiple with
    the same high score.

   -alignStats=file - output the per-alignment statistics to this file

   -uniqueMapped - keep only cDNAs that are uniquely aligned after all
    other filters have been applied.

   -noValidate - don't run pslCheck validation.

   -verbose=1 - 0: quite
                1: output stats
                2: list problem alignment (weird or invalid)
                3: list dropped alignments and reason for dropping
                4: list kept psl and info
                5: info about all PSLs

   -hapRefMapped=psl - output PSLs of haplotype to reference chromosome
    cDNA alignments mappings (for debugging purposes).

   -hapRefCDnaAlns=psl - output PSLs of haplotype cDNA to reference cDNA
    alignments (for debugging purposes).

   -alnIdQNameMode - add internal assigned alignment numbers to cDNA names
    on output.  Useful for debugging, as they are include in the verbose
    tracing as [#1], etc.  Will make a mess of normal production usage.

   -blackList=file.txt - adds a list of accession ranges to a black list.
    Any accession on this list is dropped. Black list file is two columns
    where the first column is the beginning of the range, and the second
    column is the end of the range, inclusive.

The default options don't do any filtering. If no filtering
criteria are specified, all PSLs will be passed though, except
those that are internally inconsistent.

THE INPUT MUST BE BE SORTED BY QUERY for the comparative filters.

================================================================
========   pslPretty   ====================================
================================================================
pslPretty - Convert PSL to human readable output
usage:
   pslPretty in.psl target.lst query.lst pretty.out
options:
   -axt - save in something like Scott Schwartz's axt format
          Note gaps in both sequences are still allowed in the
          output which not all axt readers will expect
   -dot=N Put out a dot every N records
   -long - Don't abbreviate long inserts
   -check=fileName - Output alignment checks to filename
It's a really good idea if the psl file is sorted by target
if it contains multiple targets.  Otherwise this will be
very very slow.   The target and query lists can either be
fasta, 2bit or nib files, or a list of fasta, 2bit and/or nib files
one per line

================================================================
========   pslReps   ====================================
================================================================
pslReps - analyse repeats and generate genome wide best
alignments from a sorted set of local alignments
usage:
    pslReps in.psl out.psl out.psr
where in.psl is an alignment file generated by psLayout and
sorted by pslSort, out.psl is the best alignment output
and out.psr contains repeat info
options:
    -nohead don't add PSL header
    -ignoreSize Will not weigh in favor of larger alignments so much
    -noIntrons Will not penalize for not having introns when calculating
              size factor
    -singleHit  Takes single best hit, not splitting into parts
    -minCover=0.N minimum coverage to output.  Default is 0.
    -ignoreNs Ignore 'N's when calculating minCover.
    -minAli=0.N minimum alignment ratio
               default is 0.93
    -nearTop=0.N how much can deviate from top and be taken
               default is 0.01
    -minNearTopSize=N  Minimum size of alignment that is near top
               for alignment to be kept.  Default 30.
    -coverQSizes=file Tab-separate file with effective query sizes.
                     When used with -minCover, this allows polyAs
                     to be excluded from the coverage calculation

================================================================
========   pslSort   ====================================
================================================================
pslSort - merge and sort psCluster .psl output files
usage:
  pslSort dirs[1|2] outFile tempDir inDir(s)
This will sort all of the .psl files in the directories
inDirs in two stages - first into temporary files in tempDir
and second into outFile.  The device on tempDir needs to have
enough space (typically 15-20 gigabytes if processing whole genome)
  pslSort g2g[1|2] outFile tempDir inDir(s)
This will sort a genome to genome alignment, reflecting the
alignments across the diagonal.

Adding 1 or 2 after the dirs or g2g will limit the program to
only the first or second pass repectively of the sort

Options:
   -nohead - do not write psl header:
   -verbose=N Set verbosity level, higher for more output. Default 1

================================================================
========   sizeof   ====================================
================================================================
     type   bytes    bits
     char	1	8
unsigned char	1	8
short int	2	16
u short int	2	16
      int	4	32
 unsigned	4	32
     long	8	64
unsigned long	8	64
long long	8	64
u long long	8	64
   size_t	8	64
   void *	8	64
    float	4	32
   double	8	64
long double	16	128
LITTLE ENDIAN machine detected
byte order: normal order: 0x12345678 in memory: 0x78563412
================================================================
========   stringify   ====================================
================================================================
stringify - Convert file to C strings
usage:
   stringify [options] in.txt
A stringified version of in.txt  will be printed to standard output.

Options:
  -var=varname - create a variable with the specified name containing
                 the string.
  -static - create the variable as a string array.

================================================================
========   textHistogram   ====================================
================================================================
textHistogram - Make a histogram in ascii
usage:
   textHistogram [options] inFile
Where inFile contains one number per line.
  options:
   -binSize=N - Size of bins, default 1
   -maxBinCount=N - Maximum # of bins, default 25
   -minVal=N - Minimum value to put in histogram, default 0
   -log - Do log transformation before plotting
   -noStar - Don't draw asterisks
   -col=N - Which column to use. Default 1
   -aveCol=N - A second column to average over. The averages
             will be output in place of counts of primary column.
   -real - Data input are real values (default is integer)
   -autoScale=N - autoscale to N # of bins
   -probValues - show prob-Values (density and cum.distr.) (sets -noStar too)
   -freq - show frequences instead of counts
   -skip=N - skip N lines before starting, default 0

================================================================
========   twoBitInfo   ====================================
================================================================
twoBitInfo - get information about sequences in a .2bit file
usage:
   twoBitInfo input.2bit output.tab
options:
   -nBed   instead of seq sizes, output BED records that define 
           areas with N's in sequence
   -noNs   outputs the length of each sequence, but does not count Ns 
Output file has the columns::
   seqName size

The 2bit file may be specified in the form path:seq or path:seq1,seq2,seqN...
so that information is returned only on the requested sequence(s).
If the form path:seq:start-end is used, start-end is ignored.

================================================================
========   twoBitToFa   ====================================
================================================================
twoBitToFa - Convert all or part of .2bit file to fasta
usage:
   twoBitToFa input.2bit output.fa
options:
   -seq=name - restrict this to just one sequence
   -start=X  - start at given position in sequence (zero-based)
   -end=X - end at given position in sequence (non-inclusive)
   -seqList=file - file containing list of the desired sequence names 
                    in the format seqSpec[:start-end], e.g. chr1 or chr1:0-189
                    where coordinates are half-open zero-based, i.e. [start,end)
   -noMask - convert sequence to all upper case
   -bpt=index.bpt - use bpt index instead of built in one
   -bed=input.bed - grab sequences specified by input.bed. Will exclude introns

Sequence and range may also be specified as part of the input
file name using the syntax:
      /path/input.2bit:name
   or
      /path/input.2bit:name
   or
      /path/input.2bit:name:start-end

================================================================
========   validateFiles   ====================================
================================================================
validateFiles - Validate format of different track input files
                Program exits with non-zero status if any errors detected
                  otherwise exits with zero status
                Use filename 'stdin' to read from stdin
                Files can be in .gz, .bz2, .zip, .Z format and are 
                  automatically decompressed
                Multiple input files of the same type can be listed
                Error messages are written to stderr
                OK or failing file lines can be optionally written to stdout
usage:
   validateFiles -type=FILE_TYPE file1 [file2 [...]]
options:
   -type=(a value from the list below)
         tagAlign|pairedTagAlign|broadPeak|narrowPeak|gappedPeak|bedGraph
                   : see http://genomewiki.cse.ucsc.edu/EncodeDCC/index.php/File_Formats
         fasta     : Fasta files (only one line of sequence, and no quality scores)
         fastq     : Fasta with quality scores (see http://maq.sourceforge.net/fastq.shtml)
         csfasta   : Colorspace fasta (implies -colorSpace) (see link below)
         csqual    : Colorspace quality (see link below)
                     (see http://marketing.appliedbiosystems.com/mk/submit/SOLID_KNOWLEDGE_RD?_JS=T&rd=dm)
         BAM       : Binary Alignment/Map
                     (see http://samtools.sourceforge.net/SAM1.pdf)
         bigWig    : Big Wig
                     (see http://genome.ucsc.edu/goldenPath/help/bigWig.html)
         bigBedN[+[P]]: 
                     (see http://genome.ucsc.edu/goldenPath/help/bigBed.html)
         bedN[+[P]] : 
                     (see http://genome.ucsc.edu/FAQ/FAQformat.html#format1)
                         N is between 3 and 15, 
                         optional (+) if extra "bedPlus" fields, 
                         optional P specifies the number of extra fields. Not required, but preferred.
                      Examples: -type=bed6 or -type=bed6+ or -type=bed6+3 

   -as=fields.as                If you have extra "bedPlus" fields, it's great to put a definition
                                  of each field in a row in AutoSql format here. Applies to bed-related types.
   -tab - If set, expect fields to be tab separated, normally
           expects white space separator. Applies to bed-related types.
   -chromDb=db                  Specify DB containing chromInfo table to validate chrom names
                                  and sizes
   -chromInfo=file.txt          Specify chromInfo file to validate chrom names and sizes
   -colorSpace                  Sequences include colorspace values [0-3] (can be used 
                                  with formats such as tagAlign and pairedTagAlign)
   -genome=path/to/hg18.2bit    Validate tagAlign or pairedTagAlign sequences match genome
                                  in .2bit file
   -mismatches=n                Maximum number of mismatches in sequence (or read pair) if 
                                  validating tagAlign or pairedTagAlign files
   -mismatchTotalQuality=n      Maximum total quality score at mismatching positions
   -matchFirst=n                only check the first N bases of the sequence
   -mmPerPair                   Check either pair dont exceed mismatch count if validating
                                  pairedTagAlign files (default is the total for the pair)
   -mmCheckOneInN=n             Check mismatches in only one in 'n' lines (default=1, all)
   -nMatch                      N's do not count as a mismatch
   -privateData                 Private data so empty sequence is tolerated
   -isSorted                    Input is sorted by chrom, only affects types tagAlign and pairedTagAlign
   -allowOther                  allow chromosomes that aren't native in BAM's
   -allowBadLength              allow chromosomes that have the wrong length in BAM
   -complementMinus             complement the query sequence on the minus strand (for testing BAM)
   -showBadAlign                show non-compliant alignments
   -bamPercent=N.N              percentage of BAM alignments that must be compliant

   -doReport                    output report in filename.report
   -version                     Print version

================================================================
========   wigCorrelate   ====================================
================================================================
wigCorrelate - Produce a table that correlates all pairs of wigs.
usage:
   wigCorrelate one.wig two.wig ... n.wig
This works on bigWig as well as wig files.
The output is to stdout
options:
   -clampMax=N - values larger than this are clipped to this value

================================================================
========   wigToBigWig   ====================================
================================================================
wigToBigWig v 4 - Convert ascii format wig file (in fixedStep, variableStep
or bedGraph format) to binary big wig format.
usage:
   wigToBigWig in.wig chrom.sizes out.bw
Where in.wig is in one of the ascii wiggle formats, but not including track lines
and chrom.sizes is two column: <chromosome name> <size in bases>
and out.bw is the output indexed big wig file.
Use the script: fetchChromSizes to obtain the actual chrom.sizes information
from UCSC, please do not make up a chrom sizes from your own information.
options:
   -blockSize=N - Number of items to bundle in r-tree.  Default 256
   -itemsPerSlot=N - Number of data points bundled at lowest level. Default 1024
   -clip - If set just issue warning messages rather than dying if wig
                  file contains items off end of chromosome.
   -unc - If set, do not use compression.
================================================================

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>