bioinformatics « 小生这厢有礼了(BioFaceBook Personal Blog)

HSPIR: A manually annotated Heat Shock Protein Information Resource

Summary: HSPIR is a concerted database of six major Heat ShockProteins (HSPs) namely Hsp70, Hsp40, Hsp60, Hsp90, Hsp100 and sHsp (small HSP). The HSPs are essential for the survival of all living organisms which protects the conformations of proteins upon exposure to various stress conditions. They are highly conserved group of proteins involved in […]

基于Vegan 软件包的生态学数据排序分析学习

“基于Vegan 软件包的生态学数据排序分析赖江山米湘成 (中国科学院植物研究所植被与环境变化国家重点实验室，北京 100093) 摘要：群落学数据一般是多维数据，例如物种属性或环境因子的属性。多元统计分析是群落生态学常用的分析方法，排序（ordination）是多元统计最常用的方法之一。CANOCO是广泛使用的排序软件，但缺点是商业软件价格不菲，版本更新速度也很慢。近年来，R语言以其灵活、开放、易于掌握、免费等诸多优点，在生态学和生物多样性研究领域迅速赢得广大研究人员的青睐。R语言中的外在软件包“Vegan”是专门用于群落生态学分析的工具。Vegan能够提供所有基本的排序方法，同时具有生成精美排序图的功能，版本更新很快。我们认为Vegan包完全可以取代CANOCO，成为今后排序分析的首选统计工具。本文首先简述排序的原理和类型，然后介绍Vegan的基本信息和下载安装过程，最后以古田山24公顷样地内随机抽取40个20m×20m的样方为例，展示Vegan包内各种常用排序方法（PCA,RDA,CA和CCA）和排序图生成过程，希望能为R的初学者尽快熟悉并利用Vegan包进行排序分析提供参考。

gtsdata

gtsenv.txt

赖江山.pdf

> setwd(“/winxp_disk2/shenzy/R/Vegan”) > gtsdata=read.table(“gtsdata.txt”, header=T) > gtsenv=read.table(“gtsenv.txt”, header=T) > install.packages(“vegan”) Installing package(s) into ‘/home/shenzy/R/x86_64-pc-linux-gnu-library/2.15’ (as ‘lib’ is unspecified) 试开URL’http://cran.csiro.au/src/contrib/vegan_2.0-4.tar.gz’ Content type ‘application/x-gzip’ length 1576584 bytes (1.5 Mb) 打开了URL ================================================== downloaded 1.5 Mb * installing *source* package ‘vegan’ … ** 成功将‘vegan’程序包解包并MD5和检查 ** libs gfortran -fpic -O3 […]

heatplot_w_dendrogram.R Script

## load the libraries required library(made4) require(graphics) library(cluster) library(stats)

#### read data in from a comma-delimited file with header array_data<-read.csv(“Overall_Outsidetreat_sort_top100.csv”,header=T, row.names=1, check.names=T) ## array_data is a data frame type ## rows are taxa_strings and columns are samples

# now transpose rows and columns # array_data<-t(array_data) ## now rows are chips and columns are taxa

dim(array_data) […]

R绘图基础（四）热图 heatmap

我们在分析了差异表达数据之后，经常要生成一种直观图－－热图(heatmap)。这一节就以基因芯片数据为例，示例生成高品质的热图。

比如

钢蓝渐白配色的热图

首先还是从最简单的heatmap开始。

> library(ggplot2) > library(ALL) #可以使用biocLite(“ALL”)安装该数据包 > data(“ALL”) > library(limma) > eset<-ALL[,ALL$mol.biol %in% c(“BCR/ABL”,”ALL1/AF4″)] > f<-factor(as.character(eset$mol.biol)) > design<-model.matrix(~f) > fit<-eBayes(lmFit(eset,design)) #对基因芯片数据进行分析，得到差异表达的数据 > selected <- p.adjust(fit$p.value[, 2]) <0.001 > esetSel <- eset[selected,] #选择其中一部分绘制热图 > dim(esetSel) #从这尺度上看，数目并不多，但也不少。如果基因数过多，可以分两次做图。 Features Samples 84 47 > library(hgu95av2.db) > data<-exprs(esetSel) > probes<-rownames(data) > symbol<-mget(probes,hgu95av2SYMBOL,ifnotfound=NA) > symbol<-do.call(rbind,symbol) > symbol[is.na(symbol[,1]),1]<-rownames(symbol)[is.na(symbol[,1])] > […]

R绘图基础（三）坐标中断(axis breaks)

R当中的坐标中断一般都使用plotrix库中的axis.break(), gap.plot(), gap.barplot(), gap.boxplot()等几个函数来实现，例：

gap plot

> library(plotrix) > opar<-par(mfrow=c(3,2)) > plot(sample(5:7,20,replace=T),main=”Axis break test”,ylim=c(2,8)) > axis.break(axis=2,breakpos=2.5,style=”gap”) > axis.break(axis=2,breakpos=3.5,style=”slash”) > axis.break(axis=2,breakpos=4.5,style=”zigzag”) > twogrp<-c(rnorm(5)+4,rnorm(5)+20,rnorm(5)+5,rnorm(5)+22) > gap.plot(twogrp,gap=c(8,16,25,35), + xlab=”X values”,ylab=”Y values”,xlim=c(1,30),ylim=c(0,25), + main=”Test two gap plot with the lot”,xtics=seq(0,30,by=5), + ytics=c(4,6,18,20,22,38,40,42), + lty=c(rep(1,10),rep(2,10)), + pch=c(rep(2,10),rep(3,10)), + col=c(rep(2,10),rep(3,10)), + type=”b”) > gap.plot(21:30,rnorm(10)+40,gap=c(8,16,25,35),add=TRUE, + lty=rep(3,10),col=rep(4,10),type=”l”) > gap.barplot(twogrp,gap=c(8,16),xlab=”Index”,ytics=c(3,6,17,20), + ylab=”Group values”,main=”Barplot […]

R绘图基础（二）点柱图(dot histogram)

在之前的一节当中，图型名称有些混乱，从这一节开始将做如下统一（不全面）：

英文名称中文名称 bar 条形图 line 线图 area 面积图 pie 饼图 high-low 高低图 pareto 帕累托图 control 控制图 boxplot 箱线图 error bar 误差条图 scatter 散点图 P-P P-P正态概率图 Q-Q Q-Q正态概率图 sequence 序列图 ROC Curve ROC分类效果曲线图 Time Series 时间序列图

好了，言归正传。那么什么又是点柱图(dot histogram)呢？之前我又称之为蜂群图(beeswarm)。还有称之为抖点图(jitter plots)。总之无论如何，在糗世界里我都称之为点柱图吧。

我们先看点柱图效果：

点柱图

以下是代码

> require(beeswarm) > data(breast) > head(breast) ER ESR1 ERBB2 time_survival event_survival 100.CEL.gz neg […]

R绘图基础（一）布局颜色等

一，布局

Ｒ绘图所占的区域，被分成两大部分，一是外围边距，一是绘图区域。

外围边距可使用par()函数中的oma来进行设置。比如oma=c(4,3,2,1)，就是指外围边距分别为下边距：4行，左边距3行，上边距2行，右边距1行。很明显这个设置顺序是从x轴开始顺时针方向。这里的行是指可以显示1行普通字体。所以当我们使用mtext中的line参数时，设置的大小就应该是[0,行数)的开区间。当我们使用mtext在外围边距上书写内容时，设置mtext中的outer=TRUE即可。

绘图区域可使用par()函数中的mfrow, mfcol来进行布局。mfrow和mfcol可以使用绘图区域被区分为多个区域。默认值为mfrow(1,1)。

比如mfrow(2,3)就是指将绘图区域分成2行3列，并按行的顺序依次绘图填充；比如mfcol(3,2)就是指将绘图区域分成3行2列，并按列的顺序依次绘图填充；

我们将每一个细分的绘图区域分为两个部分，一是绘图边距，一是主绘图。

绘图边距需要容纳的内容有坐标轴，坐标轴标签，标题。通常来讲，我们都只需要一个x轴，一个y轴，所以在设置时，一般的下边距和左边距都会大一些。如果多个x轴或者y轴，才考虑将上边距或者右边距放大一些。绘图边距可以使用par()函数中mar来设置。比如mar=c(4,3,2,1)，与外围边距的设置类似，是指绘图边距分别为下边距：4行，左边距3行，上边距2行，右边距1行。很明显这个设置顺序是从x轴开始顺时针方向。行的概念与之前的相同。也可以使用mai来设置。mai与mar唯一不同之处在于mai不是以行为单位，而是以inch为单位。

SOUTH<-1; WEST<-2; NORTH<-3; EAST<-4; GenericFigure <- function(ID, size1, size2) { plot(0:10, 0:10, type=”n”, xlab=”X”, ylab=”Y”) text(5,5, ID, col=”red”, cex=size1) box(“plot”, col=”red”) mtext(paste(“cex”,size2,sep=””), SOUTH, line=3, adj=1.0, cex=size2, col=”blue”) title(paste(“title”,ID,sep=””)) } MultipleFigures <- function() { GenericFigure(“1″, 3, 0.5) box(“figure”, lty=”dotted”, col=”blue”) GenericFigure(“2″, 3, 1) box(“figure”, lty=”dotted”, col=”blue”) GenericFigure(“3″, […]

An R package Suite for Microarray Meta-analysis in Quality Control, Differentially Expressed Gene Analysis and Pathway Enrichment Detection

An R package Suite for Microarray Meta-analysis in Quality Control, Differentially Expressed Gene Analysis and Pathway Enrichment Detection Abstract

Summary: With the rapid advances and prevalence of high-throughput genomic technologies, integrating information of multiple relevant genomic studies has brought new challenges. Microarray meta-analysis has become a frequently used tool in biomedical research. Little effort, […]

RSeQC: quality control of RNA-seq experiments

Abstract

Motivation: RNA-seq has been extensively used for transcriptome study. Quality control (QC) is critical to ensure that RNA-seq data are of high quality and suitable for subsequent analyses. However, QC is a time-consuming and complex task, due to the massive size and versatile nature of RNA-seq data. Therefore, a convenient and comprehensive QC […]

BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events

http://www.biocontext.org/

Motivation: Although the amount of data in biology is rapidly increasing, critical information for understanding biological events like phosphorylation or gene expression remains locked in the biomedical literature. Most current text mining (TM) approaches to extract information about biological events are focused on either limited-scale studies and/or abstracts, with data extracted lacking context […]

小生这厢有礼了(BioFaceBook Personal Blog)

分类

Recent Comments

链接表