Positive-Unlabeled Learning for Disease Gene Identification

Background: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the […]

RazerS 3: Faster, fully sensitive read mapping

Motivation: During the last years NGS sequencing has become a key technology for many applications in the biomedical sciences. Throughput continues to increase and new protocols provide longer reads than currently available. In almost all applications, read mapping is a first step. Hence, it is crucial to have algorithms and implementations that perform fast, […]

FacPad: Bayesian Sparse Factor Modeling for the Inference of Pathways Responsive to Drug Treatment

Motivation: It is well recognized that the effects of drugs are far beyond targeting individual proteins, but rather influencing the complex interactions among many relevant biological pathways. Genome-wide expression profiling before and after drug treatment has become a powerful approach for capturing a global snapshot of cellular response to drugs, as well as to […]

Qualimap: evaluating next generation sequencing alignment data

Motivation: The sequence alignment/map (SAM) and the binary alignment/map (BAM) formats have become the standard method of representation of nucleotide sequence alignments for next-generation sequencing data. SAM/BAM files usually contain information from tens to hundreds of millions of reads. Often, the sequencing technology, protocol, and/or the selected mapping algorithm introduce some unwanted biases in […]

MEGA-CC: Computing Core of Molecular Evolutionary Genetics Analysis program for automated and iterative data analysis

Summary: There is a growing need in the research community to apply the Molecular Evolutionary Genetics Analysis (MEGA) software tool for batch processing a large number of datasets and to integrate it into analysis workflows. We now make available the computing core of the MEGA software as a stand-alone executable (MEGA-CC), along with an […]

HSPIR: A manually annotated Heat Shock Protein Information Resource

Summary: HSPIR is a concerted database of six major Heat ShockProteins (HSPs) namely Hsp70, Hsp40, Hsp60, Hsp90, Hsp100 and sHsp (small HSP). The HSPs are essential for the survival of all living organisms which protects the conformations of proteins upon exposure to various stress conditions. They are highly conserved group of proteins involved in […]

An R package Suite for Microarray Meta-analysis in Quality Control, Differentially Expressed Gene Analysis and Pathway Enrichment Detection

An R package Suite for Microarray Meta-analysis in Quality Control, Differentially Expressed Gene Analysis and Pathway Enrichment Detection Abstract

Summary: With the rapid advances and prevalence of high-throughput genomic technologies, integrating information of multiple relevant genomic studies has brought new challenges. Microarray meta-analysis has become a frequently used tool in biomedical research. Little effort, […]

MetaPhlAn: Metagenomic Phylogenetic Analysis

MetaPhlAn is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data. MetaPhlAn relies on unique clade-specific marker genes identified from 3,000 reference genomes, allowing:

up to 25,000 reads-per-second (on one CPU) analysis speed (orders of magnitude faster compared to existing methods); unambiguous taxonomic assignments as the MetaPhlAn markers are […]

RSeQC: quality control of RNA-seq experiments

Abstract

Motivation: RNA-seq has been extensively used for transcriptome study. Quality control (QC) is critical to ensure that RNA-seq data are of high quality and suitable for subsequent analyses. However, QC is a time-consuming and complex task, due to the massive size and versatile nature of RNA-seq data. Therefore, a convenient and comprehensive QC […]

BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events

http://www.biocontext.org/

Motivation: Although the amount of data in biology is rapidly increasing, critical information for understanding biological events like phosphorylation or gene expression remains locked in the biomedical literature. Most current text mining (TM) approaches to extract information about biological events are focused on either limited-scale studies and/or abstracts, with data extracted lacking context […]