<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>小生这厢有礼了(BioFaceBook Personal Blog) &#187; R语言</title>
	<atom:link href="https://www.biofacebook.com/?cat=7&#038;feed=rss2" rel="self" type="application/rss+xml" />
	<link>https://www.biofacebook.com</link>
	<description>记录生物信息学点滴足迹（NGS,Genome,Meta,Linux)</description>
	<lastBuildDate>Sun, 23 Aug 2020 03:28:53 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.41</generator>
	<item>
		<title>heatmap R</title>
		<link>https://www.biofacebook.com/?p=1439</link>
		<comments>https://www.biofacebook.com/?p=1439#comments</comments>
		<pubDate>Tue, 18 Feb 2020 04:55:37 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[R语言]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=1439</guid>
		<description><![CDATA[<p>&#160;</p> <p>&#62; library(gplots)</p> <p>Attaching package: ‘gplots’</p> <p>The following object is masked from ‘package:stats’:</p> <p>lowess</p> <p>&#62; &#62; setwd(&#8220;/home/zyshen/work/QM_nanjing&#8221;) &#62; data2&#60;-read.csv(&#8220;combined_example.level_5.csv&#8221;, header=T, sep=&#8221;,&#8221;) &#62; data2plot&#60;-data.matrix(data2[2:3]) &#62; row.names(data2plot)&#60;-data2[,1] &#62; heatmap.2(data2plot,trace=&#8221;none&#8221;,cexCol = 2,col=greenred(50), margins = c(5, 40), sepwidth=c(0.05,0.05))</p> <p>&#160;</p> <p>&#160;</p> <p>&#160;</p> <p>if (!require(&#8220;gplots&#8221;)) { install.packages(&#8220;gplots&#8221;, dependencies = TRUE) library(gplots) } if (!require(&#8220;RColorBrewer&#8221;)) { install.packages(&#8220;RColorBrewer&#8221;, dependencies = TRUE) library(RColorBrewer) } [...]]]></description>
				<content:encoded><![CDATA[<p>&nbsp;</p>
<p>&gt; library(gplots)</p>
<p>Attaching package: ‘gplots’</p>
<p>The following object is masked from ‘package:stats’:</p>
<p>lowess</p>
<p>&gt;<br />
&gt; setwd(&#8220;/home/zyshen/work/QM_nanjing&#8221;)<br />
&gt; data2&lt;-read.csv(&#8220;combined_example.level_5.csv&#8221;, header=T, sep=&#8221;,&#8221;)<br />
&gt; data2plot&lt;-data.matrix(data2[2:3])<br />
&gt; row.names(data2plot)&lt;-data2[,1]<br />
&gt; heatmap.2(data2plot,trace=&#8221;none&#8221;,cexCol  = 2,col=greenred(50), margins = c(5, 40), sepwidth=c(0.05,0.05))<a href="http://www.biofacebook.com/wp-content/uploads/2020/02/Rplot.png"><img class="aligncenter size-large wp-image-1440" src="http://www.biofacebook.com/wp-content/uploads/2020/02/Rplot-1024x795.png" alt="Rplot" width="640" height="497" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>if (!require(&#8220;gplots&#8221;)) {<br />
install.packages(&#8220;gplots&#8221;, dependencies = TRUE)<br />
library(gplots)<br />
}<br />
if (!require(&#8220;RColorBrewer&#8221;)) {<br />
install.packages(&#8220;RColorBrewer&#8221;, dependencies = TRUE)<br />
library(RColorBrewer)<br />
}<br />
#########################################################<br />
### B) Reading in data and transform it into matrix format<br />
#########################################################<br />
setwd(&#8220;/home/zyshen/Downloads/酵母代谢&#8221;)<br />
data &lt;- read.csv(&#8220;W29-knockout-0.1-SD-10X-internal.val.filter.csv&#8221;, comment.char=&#8221;#&#8221;, header=T)<br />
rnames &lt;- data[,1] # assign labels in column 1 to &#8220;rnames&#8221;<br />
mat_data &lt;- data.matrix(data[,2:ncol(data)]) # transform column 2-5 into a matrix<br />
rownames(mat_data) &lt;- rnames # assign row names<br />
my_palette &lt;- colorRampPalette(c(&#8220;red&#8221;, &#8220;yellow&#8221;, &#8220;green&#8221;))(n = 299)</p>
<p># (optional) defines the color breaks manually for a &#8220;skewed&#8221; color transition<br />
col_breaks = c(seq(-50,0,length=100), # for red<br />
seq(0.1,20,length=100), # for yellow<br />
seq(21,100,length=100)) # for green</p>
<p>heatmap.2(mat_data,<br />
cellnote = mat_data, # same data set for cell labels<br />
main = &#8220;Correlation&#8221;, # heat map title<br />
notecol=&#8221;black&#8221;, # change font color of cell labels to black<br />
density.info=&#8221;none&#8221;, # turns off density plot inside color legend<br />
trace=&#8221;none&#8221;, # turns off trace lines inside the heat map<br />
margins =c(8,20), # widens margins around plot<br />
col=my_palette, # use on color palette defined earlier<br />
breaks=col_breaks, # enable color transition at specified limits<br />
dendrogram=&#8221;row&#8221;, # only draw a row dendrogram<br />
Colv=&#8221;NA&#8221;)</p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2020/02/RplotW29.png"><img class="aligncenter size-large wp-image-1507" src="http://www.biofacebook.com/wp-content/uploads/2020/02/RplotW29-1024x846.png" alt="RplotW29" width="640" height="529" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=1439</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Correlation tests, correlation matrix, and corresponding visualization methods in R (forward)</title>
		<link>https://www.biofacebook.com/?p=1373</link>
		<comments>https://www.biofacebook.com/?p=1373#comments</comments>
		<pubDate>Wed, 26 Jun 2019 04:46:28 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[R语言]]></category>
		<category><![CDATA[统计学习]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=1373</guid>
		<description><![CDATA[<p>https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html</p> Correlation tests, correlation matrix, and corresponding visualization methods in R Igor Hut 12 January, 2017 Install and load required R packages Methods for correlation analyses Compute correlation in R R functions Preliminary considerations Preleminary test to check the test assumptions Pearson correlation test Kendall rank correlation test Spearman rank correlation coefficient How to interpret [...]]]></description>
				<content:encoded><![CDATA[<p>https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html</p>
<div id="header" class="fluid-row">
<h1 class="title toc-ignore">Correlation tests, correlation matrix, and corresponding visualization methods in R</h1>
<h4 class="author"><em>Igor Hut</em></h4>
<h4 class="date"><em>12 January, 2017</em></h4>
</div>
<div id="TOC">
<ul>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#install-and-load-required-r-packages">Install and load required R packages</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#methods-for-correlation-analyses">Methods for correlation analyses</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#compute-correlation-in-r">Compute correlation in R</a>
<ul>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#r-functions">R functions</a></li>
</ul>
</li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#preliminary-considerations">Preliminary considerations</a>
<ul>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#preleminary-test-to-check-the-test-assumptions">Preleminary test to check the test assumptions</a></li>
</ul>
</li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#pearson-correlation-test">Pearson correlation test</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#kendall-rank-correlation-test">Kendall rank correlation test</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#spearman-rank-correlation-coefficient">Spearman rank correlation coefficient</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#how-to-interpret-correlation-coefficient">How to interpret correlation coefficient</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#what-is-a-correlation-matrix">What is a correlation matrix?</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#compute-correlation-matrix-in-r">Compute correlation matrix in R</a>
<ul>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#plain-correlation-matrix">Plain correlation matrix</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#correlation-matrix-with-significance-levels-p-value">Correlation matrix with significance levels (p-value)</a></li>
</ul>
</li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#custom-function-for-convinient-formatting-of-the-correlation-matrix">Custom function for convinient formatting of the correlation matrix</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#visualization-of-a-correlation-matrix">Visualization of a correlation matrix</a>
<ul>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#use-symnum-function-symbolic-number-coding">Use <code>symnum()</code> function: Symbolic number coding</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#use-the-corrplot-function-draw-a-correlogram">Use the <code>corrplot()</code> function: Draw a correlogram</a>
<ul>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#correlogram-layouts">Correlogram layouts:</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#reordering-the-correlation-matrix">Reordering the correlation matrix</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#changing-the-color-and-direction-of-text-labels-in-the-correlogram">Changing the color and direction of text labels in the correlogram</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#combining-correlogram-with-the-significance-test">Combining correlogram with the significance test</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#fine-tuning-customization-of-the-correlogram">Fine tuning customization of the correlogram</a></li>
</ul>
</li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#use-chart.correlation-draw-scatter-plots">Use <code>chart.Correlation()</code>: Draw scatter plots</a></li>
<li><a href="https://rstudio-pubs-static.s3.amazonaws.com/240657_5157ff98e8204c358b2118fa69162e18.html#use-heatmap">Use <code>heatmap()</code></a></li>
</ul>
</li>
</ul>
</div>
<p><strong>The following content is mostly compiled (with some original additions on my side) from the material that can be found at<a class="uri" href="http://www.sthda.com/">http://www.sthda.com/</a>, as well as in the vignette for the <code>corrplot</code> R package &#8211; <a href="https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html">An Introduction to corrplot Package</a>. The sole purpose of this text is to put all the info into one document in an easy to search format. Since I’m a huge fan of Hadley Wickham’s work I’ll insist on solutions based in <a href="https://blog.rstudio.org/2016/09/15/tidyverse-1-0-0/">“tidyverse”</a> whenewer possible…</strong></p>
<div id="install-and-load-required-r-packages" class="section level3">
<h3>Install and load required R packages</h3>
<p>We’ll use the <code>ggpubr</code> R package for an easy ggplot2-based data visualization, <code>corrplot</code> package to plot correlograms, <code>Hmisc</code> to calculate correlation matrices containing both cor. coefs. and p-values,<code>corrplot</code> for plotting correlograms, and of course <code>tidyverse</code> for all the data wrangling, plotting and alike:</p>
<pre class="r"><code class="r"><span class="keyword">require</span><span class="paren">(</span><span class="identifier">ggpubr</span><span class="paren">)</span>
<span class="keyword">require</span><span class="paren">(</span><span class="identifier">tidyverse</span><span class="paren">)</span>
<span class="keyword">require</span><span class="paren">(</span><span class="identifier">Hmisc</span><span class="paren">)</span>
<span class="keyword">require</span><span class="paren">(</span><span class="identifier">corrplot</span><span class="paren">)</span></code></pre>
</div>
<div id="methods-for-correlation-analyses" class="section level2">
<h2>Methods for correlation analyses</h2>
<p>There are different methods to perform correlation analysis:</p>
<ul>
<li><strong>Pearson correlation (r)</strong>, which measures a linear dependence between two variables (x and y). It’s also known as a parametric correlation test because it depends to the distribution of the data. It can be used only when x and y are from normal distribution. The plot of <em>y = f(x)</em> is named the <em>linear regression curve</em>.</li>
<li><strong>Kendall <span class="math inline"><span id="MathJax-Element-1-Frame" class="MathJax" style="box-sizing: border-box; display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 14px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;mi&gt;&amp;#x03C4;&lt;/mi&gt;&lt;/math&gt;"><span id="MathJax-Span-1" class="math"><span id="MathJax-Span-2" class="mrow"><span id="MathJax-Span-3" class="mi">τ</span></span></span><span class="MJX_Assistive_MathML">τ</span></span></span></strong> and <strong>Spearman <span class="math inline"><span id="MathJax-Element-2-Frame" class="MathJax" style="box-sizing: border-box; display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 14px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;mi&gt;&amp;#x03C1;&lt;/mi&gt;&lt;/math&gt;"><span id="MathJax-Span-4" class="math"><span id="MathJax-Span-5" class="mrow"><span id="MathJax-Span-6" class="mi">ρ</span></span></span><span class="MJX_Assistive_MathML">ρ</span></span></span></strong>, which are rank-based correlation coefficients (non-parametric)</li>
<li><strong>The most commonly used method is the Pearson correlation method</strong></li>
</ul>
</div>
<div id="compute-correlation-in-r" class="section level2">
<h2>Compute correlation in R</h2>
<div id="r-functions" class="section level3">
<h3>R functions</h3>
<p>Correlation coefficients can be computed in R by using the functions <code>cor()</code> and <code>cor.test()</code>:</p>
<blockquote>
<ul>
<li><code>cor()</code> computes the correlation coefficient</li>
<li><code>cor.test()</code> test for association/correlation between paired samples. It returns both the correlation coefficient and the significance level(or p-value) of the correlation.</li>
</ul>
</blockquote>
<p><em>The simplified formats are:</em></p>
<pre class="r"><code class="r"><span class="identifier">cor</span><span class="paren">(</span><span class="identifier">x</span>, <span class="identifier">y</span>, <span class="identifier">method</span> <span class="operator">=</span> <span class="identifier">c</span><span class="paren">(</span><span class="string">"pearson"</span>, <span class="string">"kendall"</span>, <span class="string">"spearman"</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">cor.test</span><span class="paren">(</span><span class="identifier">x</span>, <span class="identifier">y</span>, <span class="identifier">method</span><span class="operator">=</span><span class="identifier">c</span><span class="paren">(</span><span class="string">"pearson"</span>, <span class="string">"kendall"</span>, <span class="string">"spearman"</span><span class="paren">)</span><span class="paren">)</span></code></pre>
<p>where:</p>
<blockquote>
<ul>
<li>x, y: numeric vectors with the same length</li>
<li>method: correlation method</li>
</ul>
</blockquote>
<p><strong>If the data contain missing values, the following R code can be used to handle missing values by case-wise deletion:</strong></p>
<pre class="r"><code class="r"><span class="identifier">cor</span><span class="paren">(</span><span class="identifier">x</span>, <span class="identifier">y</span>,  <span class="identifier">method</span> <span class="operator">=</span> <span class="string">"pearson"</span>, <span class="identifier">use</span> <span class="operator">=</span> <span class="string">"complete.obs"</span><span class="paren">)</span></code></pre>
</div>
</div>
<div id="preliminary-considerations" class="section level2">
<h2>Preliminary considerations</h2>
<p>We’ll use the well known built-in <code>mtcars</code> R dataset.</p>
<pre class="r"><code class="r"><span class="identifier">head</span><span class="paren">(</span><span class="identifier">mtcars</span><span class="paren">)</span></code></pre>
<pre><code>##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1</code></pre>
<blockquote><p>We’d like to compute the correlation between <code>mpg</code> and <code>wt</code> variables.</p></blockquote>
<p><strong>First let’s visualise our data by the means of a scatter plot. We’ll be using <code>ggpubr</code> R package</strong></p>
<pre class="r"><code class="r"><span class="keyword">library</span><span class="paren">(</span><span class="identifier">ggpubr</span><span class="paren">)</span>

<span class="identifier">my_data</span> <span class="operator">&lt;-</span> <span class="identifier">mtcars</span>
<span class="identifier">my_data</span><span class="operator">$</span><span class="identifier">cyl</span> <span class="operator">&lt;-</span> <span class="identifier">factor</span><span class="paren">(</span><span class="identifier">my_data</span><span class="operator">$</span><span class="identifier">cyl</span><span class="paren">)</span>
<span class="identifier">str</span><span class="paren">(</span><span class="identifier">my_data</span><span class="paren">)</span></code></pre>
<pre><code>## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...</code></pre>
<pre class="r"><code class="r"><span class="identifier">ggscatter</span><span class="paren">(</span><span class="identifier">my_data</span>, <span class="identifier">x</span> <span class="operator">=</span> <span class="string">"wt"</span>, <span class="identifier">y</span> <span class="operator">=</span> <span class="string">"mpg"</span>,
          <span class="identifier">add</span> <span class="operator">=</span> <span class="string">"reg.line"</span>, <span class="identifier">conf.int</span> <span class="operator">=</span> <span class="literal">TRUE</span>,
          <span class="identifier">cor.coef</span> <span class="operator">=</span> <span class="literal">TRUE</span>, <span class="identifier">cor.method</span> <span class="operator">=</span> <span class="string">"pearson"</span>,
          <span class="identifier">xlab</span> <span class="operator">=</span> <span class="string">"Weight (1000 lbs)"</span>, <span class="identifier">ylab</span> <span class="operator">=</span> <span class="string">"Miles/ (US) gallon"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<div id="preleminary-test-to-check-the-test-assumptions" class="section level3">
<h3>Preleminary test to check the test assumptions</h3>
<ol>
<li>Is the relation between variables linear? Yes, from the plot above, the relationship can be, closely enough, modeled as linear. In the situation where the scatter plots show curved patterns, we are dealing with nonlinear association between the two variables.</li>
<li>Are the data from each of the 2 variables (x, y) following a normal distribution?
<ul>
<li>Use <em>Shapiro-Wilk</em> normality test <span class="math inline"><span id="MathJax-Element-3-Frame" class="MathJax" style="box-sizing: border-box; display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 14px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;mo stretchy=&quot;false&quot;&gt;&amp;#x2192;&lt;/mo&gt;&lt;/math&gt;"><span id="MathJax-Span-7" class="math"><span id="MathJax-Span-8" class="mrow"><span id="MathJax-Span-9" class="mo">→</span></span></span><span class="MJX_Assistive_MathML">→</span></span></span> R function: <code>shapiro.test()</code></li>
<li>and look at the normality plot <span class="math inline"><span id="MathJax-Element-4-Frame" class="MathJax" style="box-sizing: border-box; display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 14px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;mo stretchy=&quot;false&quot;&gt;&amp;#x2192;&lt;/mo&gt;&lt;/math&gt;"><span id="MathJax-Span-10" class="math"><span id="MathJax-Span-11" class="mrow"><span id="MathJax-Span-12" class="mo">→</span></span></span><span class="MJX_Assistive_MathML">→</span></span></span> R function: <code>ggpubr::ggqqplot()</code></li>
</ul>
</li>
</ol>
<ul>
<li><em>Shapiro-Wilk</em> test can be performed as follow:
<ul>
<li>Null hypothesis: the data are normally distributed</li>
<li>Alternative hypothesis: the data are not normally distributed</li>
</ul>
</li>
</ul>
<pre class="r"><code class="r"><span class="comment"># Shapiro-Wilk normality test for mpg</span>
<span class="identifier">shapiro.test</span><span class="paren">(</span><span class="identifier">my_data</span><span class="operator">$</span><span class="identifier">mpg</span><span class="paren">)</span> <span class="comment"># =&gt; p = 0.1229</span></code></pre>
<pre><code>## 
##  Shapiro-Wilk normality test
## 
## data:  my_data$mpg
## W = 0.94756, p-value = 0.1229</code></pre>
<pre class="r"><code class="r"><span class="comment"># Shapiro-Wilk normality test for wt</span>
<span class="identifier">shapiro.test</span><span class="paren">(</span><span class="identifier">my_data</span><span class="operator">$</span><span class="identifier">wt</span><span class="paren">)</span> <span class="comment"># =&gt; p = 0.09</span></code></pre>
<pre><code>## 
##  Shapiro-Wilk normality test
## 
## data:  my_data$wt
## W = 0.94326, p-value = 0.09265</code></pre>
<p><em>As can be seen from the output, the two p-values are greater than the predetermined significance level of 0.05 implying that the distribution of the data are not significantly different from normal distribution. In other words, we can assume the normality.</em></p>
<ul>
<li>One more option for checking the normality of the data distribution is visual inspection of the Q-Q plots (quantile-quantile plots). Q-Q plot draws the correlation between a given sample and the theoretical normal distribution.</li>
</ul>
<p>Again, we’ll use the <code>ggpubr</code> R package to obtain “pretty”, i.e. publishing-ready, Q-Q plots.</p>
<pre class="r"><code class="r"><span class="keyword">library</span><span class="paren">(</span><span class="string">"ggpubr"</span><span class="paren">)</span>
<span class="comment"># Check for the normality of "mpg""</span>
<span class="identifier">ggqqplot</span><span class="paren">(</span><span class="identifier">my_data</span><span class="operator">$</span><span class="identifier">mpg</span>, <span class="identifier">ylab</span> <span class="operator">=</span> <span class="string">"MPG"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<pre class="r"><code class="r"><span class="comment"># Check for the normality of "wt""</span>
<span class="identifier">ggqqplot</span><span class="paren">(</span><span class="identifier">my_data</span><span class="operator">$</span><span class="identifier">wt</span>, <span class="identifier">ylab</span> <span class="operator">=</span> <span class="string">"WT"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<p><em>From the Q-Q normality plots, we can assume that both samples may come from populations that, closely enough, follow normal distributions.</em></p>
<p><strong>It is important to note that if the data does not follow the normal distribution, at least closely enough, it’s recommended to use the non-parametric correlation, including Spearman and Kendall rank-based correlation tests.</strong></p>
</div>
</div>
<div id="pearson-correlation-test" class="section level2">
<h2>Pearson correlation test</h2>
<p><strong>Example:</strong></p>
<pre class="r"><code class="r"><span class="identifier">res</span> <span class="operator">&lt;-</span> <span class="identifier">cor.test</span><span class="paren">(</span><span class="identifier">my_data</span><span class="operator">$</span><span class="identifier">wt</span>, <span class="identifier">my_data</span><span class="operator">$</span><span class="identifier">mpg</span>, <span class="identifier">method</span> <span class="operator">=</span> <span class="string">"pearson"</span><span class="paren">)</span>
<span class="identifier">res</span></code></pre>
<pre><code>## 
##  Pearson's product-moment correlation
## 
## data:  my_data$wt and my_data$mpg
## t = -9.559, df = 30, p-value = 1.294e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.9338264 -0.7440872
## sample estimates:
##        cor 
## -0.8676594</code></pre>
<p>So what’s happening here? First of all let’s clarify the meaning of this printout:</p>
<ul>
<li><code>t</code> is the <em>t-test</em> statistic value <code>(t = -9.559)</code>,</li>
<li><code>df</code> is the degrees of freedom <code>(df= 30)</code>,</li>
<li><code>p-value</code> is the significance level of the <em>t-test</em> <code>(p-value =</code> <span class="math inline"><span id="MathJax-Element-5-Frame" class="MathJax" style="box-sizing: border-box; display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 14px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;msup&gt;&lt;mn&gt;1.29410&lt;/mn&gt;&lt;mrow class=&quot;MJX-TeXAtom-ORD&quot;&gt;&lt;mo&gt;&amp;#x2212;&lt;/mo&gt;&lt;mn&gt;10&lt;/mn&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;/math&gt;"><span id="MathJax-Span-13" class="math"><span id="MathJax-Span-14" class="mrow"><span id="MathJax-Span-15" class="msubsup"><span id="MathJax-Span-16" class="mn">1.29410</span><span id="MathJax-Span-17" class="texatom"><span id="MathJax-Span-18" class="mrow"><span id="MathJax-Span-19" class="mo">−</span><span id="MathJax-Span-20" class="mn">10</span></span></span></span></span></span><span class="MJX_Assistive_MathML">1.29410−10</span></span></span><code>)</code>.</li>
<li><code>conf.int</code> is the <em>confidence interval</em> of the correlation coefficient at 95% <code>(conf.int = [-0.9338, -0.7441])</code>;</li>
<li><code>sample estimates</code> is the <em>correlation coefficient</em> <code>(Cor.coeff = -0.87)</code>.</li>
</ul>
<p><strong>Interpretation of the results:</strong> As can be see from the results above the <em>p-value</em> of the test is <span class="math inline"><span id="MathJax-Element-6-Frame" class="MathJax" style="box-sizing: border-box; display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 14px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;msup&gt;&lt;mn&gt;1.29410&lt;/mn&gt;&lt;mrow class=&quot;MJX-TeXAtom-ORD&quot;&gt;&lt;mo&gt;&amp;#x2212;&lt;/mo&gt;&lt;mn&gt;10&lt;/mn&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;/math&gt;"><span id="MathJax-Span-21" class="math"><span id="MathJax-Span-22" class="mrow"><span id="MathJax-Span-23" class="msubsup"><span id="MathJax-Span-24" class="mn">1.29410</span><span id="MathJax-Span-25" class="texatom"><span id="MathJax-Span-26" class="mrow"><span id="MathJax-Span-27" class="mo">−</span><span id="MathJax-Span-28" class="mn">10</span></span></span></span></span></span><span class="MJX_Assistive_MathML">1.29410−10</span></span></span>, which is less than the significance level<span class="math inline"><span id="MathJax-Element-7-Frame" class="MathJax" style="box-sizing: border-box; display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 14px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;mi&gt;&amp;#x03B1;&lt;/mi&gt;&lt;mo&gt;=&lt;/mo&gt;&lt;mn&gt;0.05&lt;/mn&gt;&lt;/math&gt;"><span id="MathJax-Span-29" class="math"><span id="MathJax-Span-30" class="mrow"><span id="MathJax-Span-31" class="mi">α</span><span id="MathJax-Span-32" class="mo">=</span><span id="MathJax-Span-33" class="mn">0.05</span></span></span><span class="MJX_Assistive_MathML">α=0.05</span></span></span>. We can conclude that <code>wt</code> and <code>mpg</code> are significantly correlated with a correlation coefficient of -0.87 and <em>p-value</em> of <span class="math inline"><span id="MathJax-Element-8-Frame" class="MathJax" style="box-sizing: border-box; display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 14px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;msup&gt;&lt;mn&gt;1.29410&lt;/mn&gt;&lt;mrow class=&quot;MJX-TeXAtom-ORD&quot;&gt;&lt;mo&gt;&amp;#x2212;&lt;/mo&gt;&lt;mn&gt;10&lt;/mn&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;/math&gt;"><span id="MathJax-Span-34" class="math"><span id="MathJax-Span-35" class="mrow"><span id="MathJax-Span-36" class="msubsup"><span id="MathJax-Span-37" class="mn">1.29410</span><span id="MathJax-Span-38" class="texatom"><span id="MathJax-Span-39" class="mrow"><span id="MathJax-Span-40" class="mo">−</span><span id="MathJax-Span-41" class="mn">10</span></span></span></span></span></span><span class="MJX_Assistive_MathML">1.29410−10</span></span></span>.</p>
<p><strong>Access to the values returned by <code>cor.test()</code> function</strong></p>
<p>The function <code>cor.test()</code> returns a list containing the following components:</p>
<pre class="r"><code class="r"><span class="identifier">str</span><span class="paren">(</span><span class="identifier">res</span><span class="paren">)</span></code></pre>
<pre><code>## List of 9
##  $ statistic  : Named num -9.56
##   ..- attr(*, "names")= chr "t"
##  $ parameter  : Named int 30
##   ..- attr(*, "names")= chr "df"
##  $ p.value    : num 1.29e-10
##  $ estimate   : Named num -0.868
##   ..- attr(*, "names")= chr "cor"
##  $ null.value : Named num 0
##   ..- attr(*, "names")= chr "correlation"
##  $ alternative: chr "two.sided"
##  $ method     : chr "Pearson's product-moment correlation"
##  $ data.name  : chr "my_data$wt and my_data$mpg"
##  $ conf.int   : atomic [1:2] -0.934 -0.744
##   ..- attr(*, "conf.level")= num 0.95
##  - attr(*, "class")= chr "htest"</code></pre>
<p>Of these we are most interested with:</p>
<ul>
<li><code>p.value</code>: the p-value of the test</li>
<li><code>estimate</code>: the correlation coefficient</li>
</ul>
<pre class="r"><code class="r"><span class="comment"># Extract the p.value</span>
<span class="identifier">res</span><span class="operator">$</span><span class="identifier">p.value</span></code></pre>
<pre><code>## [1] 1.293959e-10</code></pre>
<pre class="r"><code class="r"><span class="comment"># Extract the correlation coefficient</span>
<span class="identifier">res</span><span class="operator">$</span><span class="identifier">estimate</span></code></pre>
<pre><code>##        cor 
## -0.8676594</code></pre>
</div>
<div id="kendall-rank-correlation-test" class="section level2">
<h2>Kendall rank correlation test</h2>
<p><strong>The Kendall rank correlation coefficient</strong> or <strong>Kendall’s <span class="math inline"><span id="MathJax-Element-9-Frame" class="MathJax" style="box-sizing: border-box; display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 14px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;mi&gt;&amp;#x03C4;&lt;/mi&gt;&lt;/math&gt;"><span id="MathJax-Span-42" class="math"><span id="MathJax-Span-43" class="mrow"><span id="MathJax-Span-44" class="mi">τ</span></span></span><span class="MJX_Assistive_MathML">τ</span></span></span> statistic</strong> is used to estimate a rank-based measure of association. This test may be used if the data do not necessarily come from a bivariate normal distribution.</p>
<pre class="r"><code class="r"><span class="identifier">res2</span> <span class="operator">&lt;-</span> <span class="identifier">cor.test</span><span class="paren">(</span><span class="identifier">my_data</span><span class="operator">$</span><span class="identifier">mpg</span>, <span class="identifier">my_data</span><span class="operator">$</span><span class="identifier">wt</span>, <span class="identifier">method</span> <span class="operator">=</span> <span class="string">"kendall"</span><span class="paren">)</span></code></pre>
<pre><code>## Warning in cor.test.default(my_data$mpg, my_data$wt, method = "kendall"):
## Cannot compute exact p-value with ties</code></pre>
<pre class="r"><code class="r"><span class="identifier">res2</span></code></pre>
<pre><code>## 
##  Kendall's rank correlation tau
## 
## data:  my_data$mpg and my_data$wt
## z = -5.7981, p-value = 6.706e-09
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##        tau 
## -0.7278321</code></pre>
<p>Here <code>tau</code> is the Kendall correlation coefficient, so The correlation coefficient between <code>mpg</code> and <code>wy</code> is -0.7278 and the p-value is <span class="math inline"><span id="MathJax-Element-10-Frame" class="MathJax" style="box-sizing: border-box; display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 14px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;msup&gt;&lt;mn&gt;6.70610&lt;/mn&gt;&lt;mrow class=&quot;MJX-TeXAtom-ORD&quot;&gt;&lt;mo&gt;&amp;#x2212;&lt;/mo&gt;&lt;mn&gt;9&lt;/mn&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;/math&gt;"><span id="MathJax-Span-45" class="math"><span id="MathJax-Span-46" class="mrow"><span id="MathJax-Span-47" class="msubsup"><span id="MathJax-Span-48" class="mn">6.70610</span><span id="MathJax-Span-49" class="texatom"><span id="MathJax-Span-50" class="mrow"><span id="MathJax-Span-51" class="mo">−</span><span id="MathJax-Span-52" class="mn">9</span></span></span></span></span></span><span class="MJX_Assistive_MathML">6.70610−9</span></span></span>.</p>
</div>
<div id="spearman-rank-correlation-coefficient" class="section level2">
<h2>Spearman rank correlation coefficient</h2>
<p><strong>Spearman’s <span class="math inline"><span id="MathJax-Element-11-Frame" class="MathJax" style="box-sizing: border-box; display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 14px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;mi&gt;&amp;#x03C1;&lt;/mi&gt;&lt;/math&gt;"><span id="MathJax-Span-53" class="math"><span id="MathJax-Span-54" class="mrow"><span id="MathJax-Span-55" class="mi">ρ</span></span></span><span class="MJX_Assistive_MathML">ρ</span></span></span> statistic</strong> is also used to estimate a rank-based measure of association. This test may be used if the data do not come from a bivariate normal distribution.</p>
<pre class="r"><code class="r"><span class="identifier">res3</span> <span class="operator">&lt;-</span> <span class="identifier">cor.test</span><span class="paren">(</span><span class="identifier">my_data</span><span class="operator">$</span><span class="identifier">wt</span>, <span class="identifier">my_data</span><span class="operator">$</span><span class="identifier">mpg</span>, <span class="identifier">method</span> <span class="operator">=</span> <span class="string">"spearman"</span><span class="paren">)</span></code></pre>
<pre><code>## Warning in cor.test.default(my_data$wt, my_data$mpg, method = "spearman"):
## Cannot compute exact p-value with ties</code></pre>
<pre class="r"><code class="r"><span class="identifier">res3</span></code></pre>
<pre><code>## 
##  Spearman's rank correlation rho
## 
## data:  my_data$wt and my_data$mpg
## S = 10292, p-value = 1.488e-11
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## -0.886422</code></pre>
<p>Here, <code>rho</code> is the Spearman’s correlation coefficient, so the correlation coefficient between <code>mpg</code> and <code>wt</code> is -0.8864 and the p-value is <span class="math inline"><span id="MathJax-Element-12-Frame" class="MathJax" style="box-sizing: border-box; display: inline; font-style: normal; font-weight: normal; line-height: normal; font-size: 14px; text-indent: 0px; text-align: left; text-transform: none; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;" tabindex="0" data-mathml="&lt;math xmlns=&quot;http://www.w3.org/1998/Math/MathML&quot;&gt;&lt;msup&gt;&lt;mn&gt;1.48810&lt;/mn&gt;&lt;mrow class=&quot;MJX-TeXAtom-ORD&quot;&gt;&lt;mo&gt;&amp;#x2212;&lt;/mo&gt;&lt;mn&gt;11&lt;/mn&gt;&lt;/mrow&gt;&lt;/msup&gt;&lt;/math&gt;"><span id="MathJax-Span-56" class="math"><span id="MathJax-Span-57" class="mrow"><span id="MathJax-Span-58" class="msubsup"><span id="MathJax-Span-59" class="mn">1.48810</span><span id="MathJax-Span-60" class="texatom"><span id="MathJax-Span-61" class="mrow"><span id="MathJax-Span-62" class="mo">−</span><span id="MathJax-Span-63" class="mn">11</span></span></span></span></span></span><span class="MJX_Assistive_MathML">1.48810−11</span></span></span>.</p>
</div>
<div id="how-to-interpret-correlation-coefficient" class="section level2">
<h2>How to interpret correlation coefficient</h2>
<p><strong>Value of the correlation coefficient can vary between -1 and 1:</strong></p>
<ul>
<li>-1 indicates a strong negative correlation : this means that every time x increases, y decreases</li>
<li>0 means that there is no association between the two variables (x and y)</li>
<li>1 indicates a strong positive correlation : this means that y increases with x</li>
</ul>
</div>
<div id="what-is-a-correlation-matrix" class="section level2">
<h2>What is a correlation matrix?</h2>
<p>Previously, we described how to perform correlation test between two variables. In the following sections we’ll see how a <strong>correlation matrix</strong> can be computed and visualized. The <strong>correlation matrix</strong> is used to investigate the dependence between multiple variables at the same time. The result is a table containing the correlation coefficients between each variable and the others.</p>
</div>
<div id="compute-correlation-matrix-in-r" class="section level2">
<h2>Compute correlation matrix in R</h2>
<p>We have already mentioned the <code>cor()</code> function, at the intoductory part of this document dealing with the correlation test for a bivariate case. It be used to compute a correlation matrix. A simplified format of the function is :</p>
<pre class="r"><code class="r"><span class="identifier">cor</span><span class="paren">(</span><span class="identifier">x</span>, <span class="identifier">method</span> <span class="operator">=</span> <span class="identifier">c</span><span class="paren">(</span><span class="string">"pearson"</span>, <span class="string">"kendall"</span>, <span class="string">"spearman"</span><span class="paren">)</span><span class="paren">)</span> </code></pre>
<p>Here:</p>
<ul>
<li><code>x</code> is numeric matrix or a data frame.</li>
<li><code>method</code>: indicates the correlation coefficient to be computed. The default is “pearson”&#8221; correlation coefficient which measures the linear dependence between two variables. As already explained “kendall” and “spearman” correlation methods are non-parametric rank-based correlation tests.</li>
</ul>
<p>If your data contain missing values, the following R code can be used to handle missing values by case-wise deletion:</p>
<pre class="r"><code class="r"><span class="identifier">cor</span><span class="paren">(</span><span class="identifier">x</span>, <span class="identifier">method</span> <span class="operator">=</span> <span class="string">"pearson"</span>, <span class="identifier">use</span> <span class="operator">=</span> <span class="string">"complete.obs"</span><span class="paren">)</span></code></pre>
<div id="plain-correlation-matrix" class="section level3">
<h3>Plain correlation matrix</h3>
<p><strong>Example:</strong></p>
<pre class="r"><code class="r"><span class="keyword">library</span><span class="paren">(</span><span class="identifier">dplyr</span><span class="paren">)</span>

<span class="identifier">my_data</span> <span class="operator">&lt;-</span> <span class="identifier">select</span><span class="paren">(</span><span class="identifier">mtcars</span>, <span class="identifier">mpg</span>, <span class="identifier">disp</span>, <span class="identifier">hp</span>, <span class="identifier">drat</span>, <span class="identifier">wt</span>, <span class="identifier">qsec</span><span class="paren">)</span>
<span class="identifier">head</span><span class="paren">(</span><span class="identifier">my_data</span><span class="paren">)</span></code></pre>
<pre><code>##                    mpg disp  hp drat    wt  qsec
## Mazda RX4         21.0  160 110 3.90 2.620 16.46
## Mazda RX4 Wag     21.0  160 110 3.90 2.875 17.02
## Datsun 710        22.8  108  93 3.85 2.320 18.61
## Hornet 4 Drive    21.4  258 110 3.08 3.215 19.44
## Hornet Sportabout 18.7  360 175 3.15 3.440 17.02
## Valiant           18.1  225 105 2.76 3.460 20.22</code></pre>
<pre class="r"><code class="r"><span class="comment">#Let's compute the correlation matrix</span>
<span class="identifier">cor_1</span> <span class="operator">&lt;-</span> <span class="identifier">round</span><span class="paren">(</span><span class="identifier">cor</span><span class="paren">(</span><span class="identifier">my_data</span><span class="paren">)</span>, <span class="number">2</span><span class="paren">)</span>
<span class="identifier">cor_1</span></code></pre>
<pre><code>##        mpg  disp    hp  drat    wt  qsec
## mpg   1.00 -0.85 -0.78  0.68 -0.87  0.42
## disp -0.85  1.00  0.79 -0.71  0.89 -0.43
## hp   -0.78  0.79  1.00 -0.45  0.66 -0.71
## drat  0.68 -0.71 -0.45  1.00 -0.71  0.09
## wt   -0.87  0.89  0.66 -0.71  1.00 -0.17
## qsec  0.42 -0.43 -0.71  0.09 -0.17  1.00</code></pre>
<p>Unfortunately, the function <code>cor()</code> returns only the correlation coefficients between variables. In the next section, we will use <code>Hmisc</code> R package to calculate the correlation <em>p-values</em>.</p>
</div>
<div id="correlation-matrix-with-significance-levels-p-value" class="section level3">
<h3>Correlation matrix with significance levels (p-value)</h3>
<p>The function <code>rcorr()</code> (in <code>Hmisc</code> package) can be used to compute the significance levels for pearson and spearman correlations. It returns both the <em>correlation coefficients</em> and the <em>p-value</em> of the correlation for all possible pairs of columns in the data table.</p>
<p><em>Simplified format:</em></p>
<pre class="r"><code class="r"><span class="identifier">rcorr</span><span class="paren">(</span><span class="identifier">x</span>, <span class="identifier">type</span> <span class="operator">=</span> <span class="identifier">c</span><span class="paren">(</span><span class="string">"pearson"</span>,<span class="string">"spearman"</span><span class="paren">)</span><span class="paren">)</span></code></pre>
<p><em>x</em> should be a matrix. The correlation type can be either <em>pearson</em> or <em>spearman</em>.</p>
<p><strong>Example:</strong></p>
<pre class="r"><code class="r"><span class="keyword">library</span><span class="paren">(</span><span class="string">"Hmisc"</span><span class="paren">)</span>

<span class="identifier">cor_2</span> <span class="operator">&lt;-</span> <span class="identifier">rcorr</span><span class="paren">(</span><span class="identifier">as.matrix</span><span class="paren">(</span><span class="identifier">my_data</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">cor_2</span></code></pre>
<pre><code>##        mpg  disp    hp  drat    wt  qsec
## mpg   1.00 -0.85 -0.78  0.68 -0.87  0.42
## disp -0.85  1.00  0.79 -0.71  0.89 -0.43
## hp   -0.78  0.79  1.00 -0.45  0.66 -0.71
## drat  0.68 -0.71 -0.45  1.00 -0.71  0.09
## wt   -0.87  0.89  0.66 -0.71  1.00 -0.17
## qsec  0.42 -0.43 -0.71  0.09 -0.17  1.00
## 
## n= 32 
## 
## 
## P
##      mpg    disp   hp     drat   wt     qsec  
## mpg         0.0000 0.0000 0.0000 0.0000 0.0171
## disp 0.0000        0.0000 0.0000 0.0000 0.0131
## hp   0.0000 0.0000        0.0100 0.0000 0.0000
## drat 0.0000 0.0000 0.0100        0.0000 0.6196
## wt   0.0000 0.0000 0.0000 0.0000        0.3389
## qsec 0.0171 0.0131 0.0000 0.6196 0.3389</code></pre>
<p>The output of the function <code>rcorr()</code> is a list containing the following elements :</p>
<ul>
<li><strong>r</strong> : the correlation matrix</li>
<li><strong>n</strong> : the matrix of the number of observations used in analyzing each pair of variables</li>
<li><strong>P</strong> : the p-values corresponding to the significance levels of correlations.</li>
</ul>
<p>Extracting the p-values or the correlation coefficients from the output:</p>
<pre class="r"><code class="r"><span class="identifier">str</span><span class="paren">(</span><span class="identifier">cor_2</span><span class="paren">)</span></code></pre>
<pre><code>## List of 3
##  $ r: num [1:6, 1:6] 1 -0.848 -0.776 0.681 -0.868 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:6] "mpg" "disp" "hp" "drat" ...
##   .. ..$ : chr [1:6] "mpg" "disp" "hp" "drat" ...
##  $ n: int [1:6, 1:6] 32 32 32 32 32 32 32 32 32 32 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:6] "mpg" "disp" "hp" "drat" ...
##   .. ..$ : chr [1:6] "mpg" "disp" "hp" "drat" ...
##  $ P: num [1:6, 1:6] NA 9.38e-10 1.79e-07 1.78e-05 1.29e-10 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:6] "mpg" "disp" "hp" "drat" ...
##   .. ..$ : chr [1:6] "mpg" "disp" "hp" "drat" ...
##  - attr(*, "class")= chr "rcorr"</code></pre>
<pre class="r"><code class="r"><span class="comment"># As you can see "cor_2" is a list so extracting these values is quite simple...</span>

<span class="comment"># p-values</span>
<span class="identifier">cor_2</span><span class="operator">$</span><span class="identifier">P</span></code></pre>
<pre><code>##               mpg         disp           hp         drat           wt
## mpg            NA 9.380354e-10 1.787838e-07 1.776241e-05 1.293956e-10
## disp 9.380354e-10           NA 7.142686e-08 5.282028e-06 1.222311e-11
## hp   1.787838e-07 7.142686e-08           NA 9.988768e-03 4.145833e-05
## drat 1.776241e-05 5.282028e-06 9.988768e-03           NA 4.784268e-06
## wt   1.293956e-10 1.222311e-11 4.145833e-05 4.784268e-06           NA
## qsec 1.708199e-02 1.314403e-02 5.766250e-06 6.195823e-01 3.388682e-01
##              qsec
## mpg  1.708199e-02
## disp 1.314403e-02
## hp   5.766250e-06
## drat 6.195823e-01
## wt   3.388682e-01
## qsec           NA</code></pre>
<pre class="r"><code class="r"><span class="comment"># Correlation matrix</span>
<span class="identifier">cor_2</span><span class="operator">$</span><span class="identifier">r</span></code></pre>
<pre><code>##             mpg       disp         hp        drat         wt        qsec
## mpg   1.0000000 -0.8475513 -0.7761683  0.68117189 -0.8676594  0.41868404
## disp -0.8475513  1.0000000  0.7909486 -0.71021390  0.8879799 -0.43369791
## hp   -0.7761683  0.7909486  1.0000000 -0.44875914  0.6587479 -0.70822340
## drat  0.6811719 -0.7102139 -0.4487591  1.00000000 -0.7124406  0.09120482
## wt   -0.8676594  0.8879799  0.6587479 -0.71244061  1.0000000 -0.17471591
## qsec  0.4186840 -0.4336979 -0.7082234  0.09120482 -0.1747159  1.00000000</code></pre>
</div>
</div>
<div id="custom-function-for-convinient-formatting-of-the-correlation-matrix" class="section level2">
<h2>Custom function for convinient formatting of the correlation matrix</h2>
<p>This section provides a simple function for formatting a correlation matrix into a table with 4 columns containing :</p>
<ul>
<li>Column 1 : row names (variable 1 for the correlation test)</li>
<li>Column 2 : column names (variable 2 for the correlation test)</li>
<li>Column 3 : the correlation coefficients</li>
<li>Column 4 : the p-values of the correlations</li>
</ul>
<pre class="r"><code class="r"><span class="identifier">flat_cor_mat</span> <span class="operator">&lt;-</span> <span class="keyword">function</span><span class="paren">(</span><span class="identifier">cor_r</span>, <span class="identifier">cor_p</span><span class="paren">)</span><span class="paren">{</span>
  <span class="comment">#This function provides a simple formatting of a correlation matrix</span>
  <span class="comment">#into a table with 4 columns containing :</span>
    <span class="comment"># Column 1 : row names (variable 1 for the correlation test)</span>
    <span class="comment"># Column 2 : column names (variable 2 for the correlation test)</span>
    <span class="comment"># Column 3 : the correlation coefficients</span>
    <span class="comment"># Column 4 : the p-values of the correlations</span>
  <span class="keyword">library</span><span class="paren">(</span><span class="identifier">tidyr</span><span class="paren">)</span>
  <span class="keyword">library</span><span class="paren">(</span><span class="identifier">tibble</span><span class="paren">)</span>
  <span class="identifier">cor_r</span> <span class="operator">&lt;-</span> <span class="identifier">rownames_to_column</span><span class="paren">(</span><span class="identifier">as.data.frame</span><span class="paren">(</span><span class="identifier">cor_r</span><span class="paren">)</span>, <span class="identifier">var</span> <span class="operator">=</span> <span class="string">"row"</span><span class="paren">)</span>
  <span class="identifier">cor_r</span> <span class="operator">&lt;-</span> <span class="identifier">gather</span><span class="paren">(</span><span class="identifier">cor_r</span>, <span class="identifier">column</span>, <span class="identifier">cor</span>, <span class="operator">-</span><span class="number">1</span><span class="paren">)</span>
  <span class="identifier">cor_p</span> <span class="operator">&lt;-</span> <span class="identifier">rownames_to_column</span><span class="paren">(</span><span class="identifier">as.data.frame</span><span class="paren">(</span><span class="identifier">cor_p</span><span class="paren">)</span>, <span class="identifier">var</span> <span class="operator">=</span> <span class="string">"row"</span><span class="paren">)</span>
  <span class="identifier">cor_p</span> <span class="operator">&lt;-</span> <span class="identifier">gather</span><span class="paren">(</span><span class="identifier">cor_p</span>, <span class="identifier">column</span>, <span class="identifier">p</span>, <span class="operator">-</span><span class="number">1</span><span class="paren">)</span>
  <span class="identifier">cor_p_matrix</span> <span class="operator">&lt;-</span> <span class="identifier">left_join</span><span class="paren">(</span><span class="identifier">cor_r</span>, <span class="identifier">cor_p</span>, <span class="identifier">by</span> <span class="operator">=</span> <span class="identifier">c</span><span class="paren">(</span><span class="string">"row"</span>, <span class="string">"column"</span><span class="paren">)</span><span class="paren">)</span>
  <span class="identifier">cor_p_matrix</span>
<span class="paren">}</span>

<span class="identifier">cor_3</span> <span class="operator">&lt;-</span> <span class="identifier">rcorr</span><span class="paren">(</span><span class="identifier">as.matrix</span><span class="paren">(</span><span class="identifier">mtcars</span><span class="paren">[</span>, <span class="number">1</span><span class="operator">:</span><span class="number">7</span><span class="paren">]</span><span class="paren">)</span><span class="paren">)</span>

<span class="identifier">my_cor_matrix</span> <span class="operator">&lt;-</span> <span class="identifier">flat_cor_mat</span><span class="paren">(</span><span class="identifier">cor_3</span><span class="operator">$</span><span class="identifier">r</span>, <span class="identifier">cor_3</span><span class="operator">$</span><span class="identifier">P</span><span class="paren">)</span>
<span class="identifier">head</span><span class="paren">(</span><span class="identifier">my_cor_matrix</span><span class="paren">)</span></code></pre>
<pre><code>##    row column        cor            p
## 1  mpg    mpg  1.0000000           NA
## 2  cyl    mpg -0.8521619 6.112697e-10
## 3 disp    mpg -0.8475513 9.380354e-10
## 4   hp    mpg -0.7761683 1.787838e-07
## 5 drat    mpg  0.6811719 1.776241e-05
## 6   wt    mpg -0.8676594 1.293956e-10</code></pre>
</div>
<div id="visualization-of-a-correlation-matrix" class="section level2">
<h2>Visualization of a correlation matrix</h2>
<p>There are several different ways for visualizing a correlation matrix in R software:</p>
<ul>
<li><code>symnum()</code> function</li>
<li><code>corrplot()</code> function to plot a correlogram</li>
<li>scatter plots</li>
<li>heatmap</li>
</ul>
<p>We’ll run trough all of these, and then go a bit more into deatil with correlograms.</p>
<div id="use-symnum-function-symbolic-number-coding" class="section level3">
<h3>Use <code>symnum()</code> function: Symbolic number coding</h3>
<p>The R function <code>symnum()</code> is used to symbolically encode a given numeric or logical vector or array. It is particularly useful for visualization of structured matrices, e.g., correlation, sparse, or logical ones. In the case of a correlatino matrix it replaces correlation coefficients by symbols according to the level of the correlation.</p>
<p><em>Simplified format:</em></p>
<pre class="r"><code class="r"><span class="identifier">symnum</span><span class="paren">(</span><span class="identifier">x</span>, <span class="identifier">cutpoints</span> <span class="operator">=</span> <span class="identifier">c</span><span class="paren">(</span><span class="number">0.3</span>, <span class="number">0.6</span>, <span class="number">0.8</span>, <span class="number">0.9</span>, <span class="number">0.95</span><span class="paren">)</span>,
       <span class="identifier">symbols</span> <span class="operator">=</span> <span class="identifier">c</span><span class="paren">(</span><span class="string">" "</span>, <span class="string">"."</span>, <span class="string">","</span>, <span class="string">"+"</span>, <span class="string">"*"</span>, <span class="string">"B"</span><span class="paren">)</span>,
       <span class="identifier">abbr.colnames</span> <span class="operator">=</span> <span class="literal">TRUE</span><span class="paren">)</span></code></pre>
<p>Here:</p>
<ul>
<li><strong>x:</strong> the correlation matrix to visualize</li>
<li><strong>cutpoints:</strong> correlation coefficient cutpoints. The correlation coefficients between 0 and 0.3 are replaced by a space (&#8221; “); correlation coefficients between 0.3 and 0.6 are replaced by”.“; etc .</li>
<li><strong>symbols:</strong> the symbols to use.</li>
<li><strong>abbr.colnames:</strong> logical value. If TRUE, colnames are abbreviated.</li>
</ul>
<p><strong>Example:</strong></p>
<pre class="r"><code class="r"><span class="identifier">cor_4</span> <span class="operator">&lt;-</span> <span class="identifier">cor</span><span class="paren">(</span><span class="identifier">mtcars</span><span class="paren">[</span><span class="number">1</span><span class="operator">:</span><span class="number">6</span><span class="paren">]</span><span class="paren">)</span>
<span class="identifier">symnum</span><span class="paren">(</span><span class="identifier">cor_4</span>, <span class="identifier">abbr.colnames</span> <span class="operator">=</span> <span class="literal">FALSE</span><span class="paren">)</span></code></pre>
<pre><code>##      mpg cyl disp hp drat wt
## mpg  1                      
## cyl  +   1                  
## disp +   *   1              
## hp   ,   +   ,    1         
## drat ,   ,   ,    .  1      
## wt   +   ,   +    ,  ,    1 
## attr(,"legend")
## [1] 0 ' ' 0.3 '.' 0.6 ',' 0.8 '+' 0.9 '*' 0.95 'B' 1</code></pre>
<p>*As indicated in the legend, the correlation coefficients between 0 and 0.3 are replaced by a space (&#8221; “); correlation coefficients between 0.3 and 0.6 are replace by”.“; etc .*</p>
</div>
<div id="use-the-corrplot-function-draw-a-correlogram" class="section level3">
<h3>Use the <code>corrplot()</code> function: Draw a correlogram</h3>
<p>The function <code>corrplot()</code>, in the package of the same name, creates a graphical display of a correlation matrix, highlighting the most correlated variables in a data table.</p>
<p>In this plot, correlation coefficients are colored according to the value. Correlation matrix can be also reordered according to the degree of association between variables.</p>
<p><em>The simplified format of the function is:</em></p>
<pre class="r"><code class="r"><span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">corr</span>, <span class="identifier">method</span><span class="operator">=</span><span class="string">"circle"</span><span class="paren">)</span></code></pre>
<p>Here:</p>
<ul>
<li><strong>corr:</strong> the correlation matrix to be visualized</li>
<li><strong>method:</strong> The visualization method to be used, there are seven different options: “circle”, “square”, “ellipse”, “number”, “shade”, “color”, “pie”.</li>
</ul>
<p><strong>Example:</strong></p>
<pre class="r"><code class="r"><span class="identifier">M</span><span class="operator">&lt;-</span><span class="identifier">cor</span><span class="paren">(</span><span class="identifier">mtcars</span><span class="paren">)</span>
<span class="identifier">head</span><span class="paren">(</span><span class="identifier">round</span><span class="paren">(</span><span class="identifier">M</span>,<span class="number">2</span><span class="paren">)</span><span class="paren">)</span></code></pre>
<pre><code>##        mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
## mpg   1.00 -0.85 -0.85 -0.78  0.68 -0.87  0.42  0.66  0.60  0.48 -0.55
## cyl  -0.85  1.00  0.90  0.83 -0.70  0.78 -0.59 -0.81 -0.52 -0.49  0.53
## disp -0.85  0.90  1.00  0.79 -0.71  0.89 -0.43 -0.71 -0.59 -0.56  0.39
## hp   -0.78  0.83  0.79  1.00 -0.45  0.66 -0.71 -0.72 -0.24 -0.13  0.75
## drat  0.68 -0.70 -0.71 -0.45  1.00 -0.71  0.09  0.44  0.71  0.70 -0.09
## wt   -0.87  0.78  0.89  0.66 -0.71  1.00 -0.17 -0.55 -0.69 -0.58  0.43</code></pre>
<pre class="r"><code class="r"><span class="comment">#Visualize the correlation matrix</span>

<span class="comment"># method = "circle""</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">method</span> <span class="operator">=</span> <span class="string">"circle"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<pre class="r"><code class="r"><span class="comment"># method = "ellipse""</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">method</span> <span class="operator">=</span> <span class="string">"ellipse"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<pre class="r"><code class="r"><span class="comment"># method = "pie"</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">method</span> <span class="operator">=</span> <span class="string">"pie"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<pre class="r"><code class="r"><span class="comment"># method = "color"</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">method</span> <span class="operator">=</span> <span class="string">"color"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<p><em>Display the <strong>correlation coefficient</strong>:</em></p>
<pre class="r"><code class="r"><span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">method</span> <span class="operator">=</span> <span class="string">"number"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<div id="correlogram-layouts" class="section level4">
<h4>Correlogram layouts:</h4>
<p>There are three general types of a correlogram layout :</p>
<ul>
<li><strong>“full”</strong> (default) : display full correlation matrix</li>
<li><strong>“upper”</strong>: display upper triangular of the correlation matrix</li>
<li><strong>“lower”</strong>: display lower triangular of the correlation matrix</li>
</ul>
<p><strong>Examples:</strong></p>
<pre class="r"><code class="r"><span class="comment"># upper triangular</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">type</span> <span class="operator">=</span> <span class="string">"upper"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<pre class="r"><code class="r"><span class="comment">#lower triangular</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">type</span> <span class="operator">=</span> <span class="string">"lower"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
</div>
<div id="reordering-the-correlation-matrix" class="section level4">
<h4>Reordering the correlation matrix</h4>
<p>The correlation matrix can be reordered according to the correlation coefficient. This is important to identify the hidden structure and pattern in the matrix. Use <code>order = "hclust"</code> argument for hierarchical clustering of correlation coefficients.</p>
<p><strong>Example:</strong></p>
<pre class="r"><code class="r"><span class="comment"># correlogram with hclust reordering</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">order</span> <span class="operator">=</span> <span class="string">"hclust"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<pre class="r"><code class="r"><span class="comment"># or exploit the symetry of the correlation matrix </span>
<span class="comment"># correlogram with hclust reordering</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">type</span> <span class="operator">=</span> <span class="string">"upper"</span>, <span class="identifier">order</span> <span class="operator">=</span> <span class="string">"hclust"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
</div>
<div id="changing-the-color-and-direction-of-text-labels-in-the-correlogram" class="section level4">
<h4>Changing the color and direction of text labels in the correlogram</h4>
<p><strong>Examples:</strong></p>
<pre class="r"><code class="r"><span class="comment"># Change background color to lightgreen and color of the circles to darkorange and steel blue</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">type</span> <span class="operator">=</span> <span class="string">"upper"</span>, <span class="identifier">order</span> <span class="operator">=</span> <span class="string">"hclust"</span>, <span class="identifier">col</span> <span class="operator">=</span> <span class="identifier">c</span><span class="paren">(</span><span class="string">"darkorange"</span>, <span class="string">"steelblue"</span><span class="paren">)</span>,
         <span class="identifier">bg</span> <span class="operator">=</span> <span class="string">"lightgreen"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<pre class="r"><code class="r"><span class="comment"># use "colorRampPallete" to obtain contionus color scales</span>
<span class="identifier">col</span> <span class="operator">&lt;-</span> <span class="identifier">colorRampPalette</span><span class="paren">(</span><span class="identifier">c</span><span class="paren">(</span><span class="string">"darkorange"</span>, <span class="string">"white"</span>, <span class="string">"steelblue"</span><span class="paren">)</span><span class="paren">)</span><span class="paren">(</span><span class="number">20</span><span class="paren">)</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">type</span> <span class="operator">=</span> <span class="string">"upper"</span>, <span class="identifier">order</span> <span class="operator">=</span> <span class="string">"hclust"</span>, <span class="identifier">col</span> <span class="operator">=</span> <span class="identifier">col</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<pre class="r"><code class="r"><span class="comment"># Or use "RColorBrewer" package</span>
<span class="keyword">library</span><span class="paren">(</span><span class="identifier">RColorBrewer</span><span class="paren">)</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">type</span> <span class="operator">=</span> <span class="string">"upper"</span>, <span class="identifier">order</span> <span class="operator">=</span> <span class="string">"hclust"</span>,
         <span class="identifier">col</span> <span class="operator">=</span> <span class="identifier">brewer.pal</span><span class="paren">(</span><span class="identifier">n</span> <span class="operator">=</span> <span class="number">9</span>, <span class="identifier">name</span> <span class="operator">=</span> <span class="string">"PuOr"</span><span class="paren">)</span>, <span class="identifier">bg</span> <span class="operator">=</span> <span class="string">"darkgreen"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<p>Use the <strong><code>tl.col</code></strong> argument for defining the text label color and <strong><code>tl.srt</code></strong> for text label string rotation.</p>
<p><strong>Example:</strong></p>
<pre class="r"><code class="r"><span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">type</span> <span class="operator">=</span> <span class="string">"upper"</span>, <span class="identifier">order</span> <span class="operator">=</span> <span class="string">"hclust"</span>, <span class="identifier">tl.col</span> <span class="operator">=</span> <span class="string">"darkblue"</span>, <span class="identifier">tl.srt</span> <span class="operator">=</span> <span class="number">45</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
</div>
<div id="combining-correlogram-with-the-significance-test" class="section level4">
<h4>Combining correlogram with the significance test</h4>
<pre class="r"><code class="r"><span class="comment"># Mark the insignificant coefficients according to the specified p-value significance level</span>
<span class="identifier">cor_5</span> <span class="operator">&lt;-</span> <span class="identifier">rcorr</span><span class="paren">(</span><span class="identifier">as.matrix</span><span class="paren">(</span><span class="identifier">mtcars</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">M</span> <span class="operator">&lt;-</span> <span class="identifier">cor_5</span><span class="operator">$</span><span class="identifier">r</span>
<span class="identifier">p_mat</span> <span class="operator">&lt;-</span> <span class="identifier">cor_5</span><span class="operator">$</span><span class="identifier">P</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">type</span> <span class="operator">=</span> <span class="string">"upper"</span>, <span class="identifier">order</span> <span class="operator">=</span> <span class="string">"hclust"</span>, 
         <span class="identifier">p.mat</span> <span class="operator">=</span> <span class="identifier">p_mat</span>, <span class="identifier">sig.level</span> <span class="operator">=</span> <span class="number">0.01</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<pre class="r"><code class="r"><span class="comment"># Leave blank on no significant coefficient</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">type</span> <span class="operator">=</span> <span class="string">"upper"</span>, <span class="identifier">order</span> <span class="operator">=</span> <span class="string">"hclust"</span>, 
         <span class="identifier">p.mat</span> <span class="operator">=</span> <span class="identifier">p_mat</span>, <span class="identifier">sig.level</span> <span class="operator">=</span> <span class="number">0.05</span>, <span class="identifier">insig</span> <span class="operator">=</span> <span class="string">"blank"</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
</div>
<div id="fine-tuning-customization-of-the-correlogram" class="section level4">
<h4>Fine tuning customization of the correlogram</h4>
<pre class="r"><code class="r"><span class="identifier">col</span> <span class="operator">&lt;-</span> <span class="identifier">colorRampPalette</span><span class="paren">(</span><span class="identifier">c</span><span class="paren">(</span><span class="string">"#BB4444"</span>, <span class="string">"#EE9988"</span>, <span class="string">"#FFFFFF"</span>, <span class="string">"#77AADD"</span>, <span class="string">"#4477AA"</span><span class="paren">)</span><span class="paren">)</span>
<span class="identifier">corrplot</span><span class="paren">(</span><span class="identifier">M</span>, <span class="identifier">method</span> <span class="operator">=</span> <span class="string">"color"</span>, <span class="identifier">col</span> <span class="operator">=</span> <span class="identifier">col</span><span class="paren">(</span><span class="number">200</span><span class="paren">)</span>,  
         <span class="identifier">type</span> <span class="operator">=</span> <span class="string">"upper"</span>, <span class="identifier">order</span> <span class="operator">=</span> <span class="string">"hclust"</span>, 
         <span class="identifier">addCoef.col</span> <span class="operator">=</span> <span class="string">"black"</span>, <span class="comment"># Add coefficient of correlation</span>
         <span class="identifier">tl.col</span> <span class="operator">=</span> <span class="string">"darkblue"</span>, <span class="identifier">tl.srt</span> <span class="operator">=</span> <span class="number">45</span>, <span class="comment">#Text label color and rotation</span>
         <span class="comment"># Combine with significance level</span>
         <span class="identifier">p.mat</span> <span class="operator">=</span> <span class="identifier">p_mat</span>, <span class="identifier">sig.level</span> <span class="operator">=</span> <span class="number">0.01</span>,  
         <span class="comment"># hide correlation coefficient on the principal diagonal</span>
         <span class="identifier">diag</span> <span class="operator">=</span> <span class="literal">FALSE</span> 
         <span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<p>I’d say this is more than enough for introductory exploration of correlograms. More information can be found in the, already mentioned, vignette for the <code>corrplot</code> R package &#8211; <a href="https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html">An Introduction to corrplot Package</a></p>
</div>
</div>
<div id="use-chart.correlation-draw-scatter-plots" class="section level3">
<h3>Use <code>chart.Correlation()</code>: Draw scatter plots</h3>
<p>The function <code>chart.Correlation()</code> from the package “<code>PerformanceAnalytics</code>”, can be used to display a chart of a correlation matrix. This is a very convinient way of exploring multivariate correlations.</p>
<pre class="r"><code class="r"><span class="keyword">library</span><span class="paren">(</span><span class="string">"PerformanceAnalytics"</span><span class="paren">)</span></code></pre>
<pre><code>## Loading required package: xts</code></pre>
<pre><code>## Loading required package: zoo</code></pre>
<pre><code>## 
## Attaching package: 'zoo'</code></pre>
<pre><code>## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric</code></pre>
<pre><code>## 
## Attaching package: 'xts'</code></pre>
<pre><code>## The following objects are masked from 'package:dplyr':
## 
##     first, last</code></pre>
<pre><code>## 
## Attaching package: 'PerformanceAnalytics'</code></pre>
<pre><code>## The following object is masked from 'package:graphics':
## 
##     legend</code></pre>
<pre class="r"><code class="r"><span class="identifier">my_data</span> <span class="operator">&lt;-</span> <span class="identifier">mtcars</span><span class="paren">[</span>, <span class="identifier">c</span><span class="paren">(</span><span class="number">1</span>,<span class="number">3</span>,<span class="number">4</span>,<span class="number">5</span>,<span class="number">6</span>,<span class="number">7</span><span class="paren">)</span><span class="paren">]</span>
<span class="identifier">chart.Correlation</span><span class="paren">(</span><span class="identifier">my_data</span>, <span class="identifier">histogram</span> <span class="operator">=</span> <span class="literal">TRUE</span>, <span class="identifier">pch</span> <span class="operator">=</span> <span class="number">19</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
<p>In the above plot:</p>
<ul>
<li>The distribution of each variable is shown on the diagonal.</li>
<li>On the bottom of the diagonal : the bivariate scatter plots with a fitted line are displayed</li>
<li>On the top of the diagonal : the value of the correlation plus the significance level as stars</li>
<li>Each significance level is associated to a symbol : p-values(0, 0.001, 0.01, 0.05, 0.1, 1) &lt;=&gt; symbols(“<em><strong>“,”</strong>“,”</em>”, “.”, &#8221; “)</li>
</ul>
</div>
<div id="use-heatmap" class="section level3">
<h3>Use <code>heatmap()</code></h3>
<p>I don’t really consider this method of correlation matrix visualization to be of practical value, but nevertheless here is a small example:</p>
<pre class="r"><code class="r"><span class="comment"># Get some colors</span>
<span class="identifier">col</span> <span class="operator">&lt;-</span> <span class="identifier">colorRampPalette</span><span class="paren">(</span><span class="identifier">c</span><span class="paren">(</span><span class="string">"darkblue"</span>, <span class="string">"white"</span>, <span class="string">"darkorange"</span><span class="paren">)</span><span class="paren">)</span><span class="paren">(</span><span class="number">20</span><span class="paren">)</span>
<span class="identifier">M</span> <span class="operator">&lt;-</span> <span class="identifier">cor</span><span class="paren">(</span><span class="identifier">mtcars</span><span class="paren">[</span><span class="number">1</span><span class="operator">:</span><span class="number">7</span><span class="paren">]</span><span class="paren">)</span>
<span class="identifier">heatmap</span><span class="paren">(</span><span class="identifier">x</span> <span class="operator">=</span> <span class="identifier">M</span>, <span class="identifier">col</span> <span class="operator">=</span> <span class="identifier">col</span>, <span class="identifier">symm</span> <span class="operator">=</span> <span class="literal">TRUE</span><span class="paren">)</span></code></pre>
<p>&nbsp;</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=1373</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R drawing png with high resolution</title>
		<link>https://www.biofacebook.com/?p=1369</link>
		<comments>https://www.biofacebook.com/?p=1369#comments</comments>
		<pubDate>Fri, 31 May 2019 04:34:37 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[Linux相关]]></category>
		<category><![CDATA[R语言]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=1369</guid>
		<description><![CDATA[<p>可重复的示例：</p> the_plot &#60;- function() { x &#60;- seq(0, 1, length.out = 100) y &#60;- pbeta(x, 1, 10) plot( x, y, xlab = "False Positive Rate", ylab = "Average true positive rate", type = "l" ) } <p>&#160;</p> png( "test.png", width = 3.25, height = 3.25, units = "in", res = 1200, pointsize = 4 ) [...]]]></description>
				<content:encoded><![CDATA[<p>可重复的示例：</p>
<pre class="prettyprint"><code><span class="pln">the_plot </span><span class="pun">&lt;-</span> <span class="kwd">function</span><span class="pun">()</span>
<span class="pun">{</span><span class="pln">
  x </span><span class="pun">&lt;-</span><span class="pln"> seq</span><span class="pun">(</span><span class="lit">0</span><span class="pun">,</span> <span class="lit">1</span><span class="pun">,</span><span class="pln"> length</span><span class="pun">.</span><span class="kwd">out</span> <span class="pun">=</span> <span class="lit">100</span><span class="pun">)</span><span class="pln">
  y </span><span class="pun">&lt;-</span><span class="pln"> pbeta</span><span class="pun">(</span><span class="pln">x</span><span class="pun">,</span> <span class="lit">1</span><span class="pun">,</span> <span class="lit">10</span><span class="pun">)</span><span class="pln">
  plot</span><span class="pun">(</span><span class="pln">
    x</span><span class="pun">,</span><span class="pln">
    y</span><span class="pun">,</span><span class="pln">
    xlab </span><span class="pun">=</span> <span class="str">"False Positive Rate"</span><span class="pun">,</span><span class="pln">
    ylab </span><span class="pun">=</span> <span class="str">"Average true positive rate"</span><span class="pun">,</span><span class="pln">
    type </span><span class="pun">=</span> <span class="str">"l"</span>
  <span class="pun">)</span>
<span class="pun">}</span></code></pre>
<p>&nbsp;</p>
<pre class="prettyprint"><code><span class="pln">png</span><span class="pun">(</span>
  <span class="str">"test.png"</span><span class="pun">,</span><span class="pln">
  width     </span><span class="pun">=</span> <span class="lit">3.25</span><span class="pun">,</span><span class="pln">
  height    </span><span class="pun">=</span> <span class="lit">3.25</span><span class="pun">,</span><span class="pln">
  units     </span><span class="pun">=</span> <span class="str">"in"</span><span class="pun">,</span><span class="pln">
  res       </span><span class="pun">=</span> <span class="lit">1200</span><span class="pun">,</span><span class="pln">
  pointsize </span><span class="pun">=</span> <span class="lit">4</span>
<span class="pun">)</span><span class="pln">
par</span><span class="pun">(</span><span class="pln">
  mar      </span><span class="pun">=</span><span class="pln"> c</span><span class="pun">(</span><span class="lit">5</span><span class="pun">,</span> <span class="lit">5</span><span class="pun">,</span> <span class="lit">2</span><span class="pun">,</span> <span class="lit">2</span><span class="pun">),</span><span class="pln">
  xaxs     </span><span class="pun">=</span> <span class="str">"i"</span><span class="pun">,</span><span class="pln">
  yaxs     </span><span class="pun">=</span> <span class="str">"i"</span><span class="pun">,</span><span class="pln">
  cex</span><span class="pun">.</span><span class="pln">axis </span><span class="pun">=</span> <span class="lit">2</span><span class="pun">,</span><span class="pln">
  cex</span><span class="pun">.</span><span class="pln">lab  </span><span class="pun">=</span> <span class="lit">2</span>
<span class="pun">)</span><span class="pln">
the_plot</span><span class="pun">()</span><span class="pln">
dev</span><span class="pun">.</span><span class="pln">off</span><span class="pun">()</span></code></pre>
<p>当然，更好的解决方案是放弃这种基本的图形和使用一个系统，将处理你的分辨率缩放。例如，</p>
<pre class="prettyprint"><code><span class="pln">library</span><span class="pun">(</span><span class="pln">ggplot2</span><span class="pun">)</span><span class="pln">

ggplot_alternative </span><span class="pun">&lt;-</span> <span class="kwd">function</span><span class="pun">()</span>
<span class="pun">{</span><span class="pln">
  the_data </span><span class="pun">&lt;-</span><span class="pln"> data</span><span class="pun">.</span><span class="pln">frame</span><span class="pun">(</span><span class="pln">
    x </span><span class="pun">&lt;-</span><span class="pln"> seq</span><span class="pun">(</span><span class="lit">0</span><span class="pun">,</span> <span class="lit">1</span><span class="pun">,</span><span class="pln"> length</span><span class="pun">.</span><span class="kwd">out</span> <span class="pun">=</span> <span class="lit">100</span><span class="pun">),</span><span class="pln">
    y </span><span class="pun">=</span><span class="pln"> pbeta</span><span class="pun">(</span><span class="pln">x</span><span class="pun">,</span> <span class="lit">1</span><span class="pun">,</span> <span class="lit">10</span><span class="pun">)</span>
  <span class="pun">)</span><span class="pln">

ggplot</span><span class="pun">(</span><span class="pln">the_data</span><span class="pun">,</span><span class="pln"> aes</span><span class="pun">(</span><span class="pln">x</span><span class="pun">,</span><span class="pln"> y</span><span class="pun">))</span> <span class="pun">+</span><span class="pln">
    geom_line</span><span class="pun">()</span> <span class="pun">+</span><span class="pln">
    xlab</span><span class="pun">(</span><span class="str">"False Positive Rate"</span><span class="pun">)</span> <span class="pun">+</span><span class="pln">
    ylab</span><span class="pun">(</span><span class="str">"Average true positive rate"</span><span class="pun">)</span> <span class="pun">+</span><span class="pln">
    coord_cartesian</span><span class="pun">(</span><span class="lit">0</span><span class="pun">:</span><span class="lit">1</span><span class="pun">,</span> <span class="lit">0</span><span class="pun">:</span><span class="lit">1</span><span class="pun">)</span>
<span class="pun">}</span><span class="pln">

ggsave</span><span class="pun">(</span>
  <span class="str">"ggtest.png"</span><span class="pun">,</span><span class="pln">
  ggplot_alternative</span><span class="pun">(),</span><span class="pln">
  width </span><span class="pun">=</span> <span class="lit">3.25</span><span class="pun">,</span><span class="pln">
  height </span><span class="pun">=</span> <span class="lit">3.25</span><span class="pun">,</span><span class="pln">
  dpi </span><span class="pun">=</span> <span class="lit">1200</span>
<span class="pun">)</span></code></pre>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=1369</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>fasta2nexus by R script</title>
		<link>https://www.biofacebook.com/?p=1309</link>
		<comments>https://www.biofacebook.com/?p=1309#comments</comments>
		<pubDate>Fri, 12 Jan 2018 08:59:54 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[Linux相关]]></category>
		<category><![CDATA[R语言]]></category>
		<category><![CDATA[生物信息]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=1309</guid>
		<description><![CDATA[ Workspace loaded from ~/.RData] &#62; setwd("/home/shenzy/work/beast/51samples") &#62; library(seqinr) &#62; data=read.fasta("51strain_core_gene_alignment.aln") &#62; library(ape) Attaching package: ‘ape’ The following objects are masked from ‘package:seqinr’: as.alignment, consensus &#62; write.nexus.data(data,file="51strain_core_gene_alignment.aln.nexus", format="DNA") &#62; [...]]]></description>
				<content:encoded><![CDATA[<table class="GCWXI2KCCKB ace_text-layer ace_line GCWXI2KCAU" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td align="left">
<pre id="rstudio_console_output" class="GCWXI2KCJKB" tabindex="0">Workspace loaded from ~/.RData]

<span class="GCWXI2KCPKB ace_keyword">&gt; </span><span class="GCWXI2KCPJB ace_keyword">setwd("/home/shenzy/work/beast/51samples")
</span><span class="GCWXI2KCPKB ace_keyword">&gt; </span><span class="GCWXI2KCPJB ace_keyword">library(seqinr)
</span><span class="GCWXI2KCPKB ace_keyword">&gt; </span><span class="GCWXI2KCPJB ace_keyword">data=read.fasta("51strain_core_gene_alignment.aln")
</span><span class="GCWXI2KCPKB ace_keyword">&gt; </span><span class="GCWXI2KCPJB ace_keyword">library(ape)
</span><span class="GCWXI2KCDKB  ace_constant">
Attaching package: ‘ape’

</span><span class="GCWXI2KCDKB  ace_constant">The following objects are masked from ‘package:seqinr’:

    as.alignment, consensus

</span><span class="GCWXI2KCPKB ace_keyword">&gt; </span><span class="GCWXI2KCPJB ace_keyword">write.nexus.data(data,file="51strain_core_gene_alignment.aln.nexus", format="DNA")
</span></pre>
</td>
</tr>
<tr>
<td align="left"></td>
</tr>
<tr>
<td align="left">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td rowspan="1" align="left" width="1" height="">
<div class="GCWXI2KCPKB ace_keyword">&gt;</div>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=1309</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Remove grid and background from plot (ggplot2)</title>
		<link>https://www.biofacebook.com/?p=1264</link>
		<comments>https://www.biofacebook.com/?p=1264#comments</comments>
		<pubDate>Mon, 10 Apr 2017 06:16:20 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[R语言]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=1264</guid>
		<description><![CDATA[ Remove grid and background from plot (ggplot2) HOMECATEGORIESTAGSMY TOOLSABOUTLEAVE MESSAGERSS 2013-11-27 &#124; category RStudy &#124; tag ggplot2 Generate data library(ggplot2) a &#60;- seq(1, 20) b &#60;- a^0.25 df &#60;- as.data.frame(cbind(a, b)) basic plot myplot = ggplot(df, aes(x = a, y = b)) + geom_point() myplot <p></p> theme_bw() will get rid of the background myplot [...]]]></description>
				<content:encoded><![CDATA[<header>
<h1>Remove grid and background from plot (ggplot2)</h1>
</header>
<nav><a title="网站首页" href="http://felixfan.github.io/" target="_blank">HOME</a><a title="文章分类" href="http://felixfan.github.io/categories/" target="_blank">CATEGORIES</a><a title="标签索引" href="http://felixfan.github.io/tags/" target="_blank">TAGS</a><a title="我的软件" href="http://felixfan.github.io/tools" target="_blank">MY TOOLS</a><a title="关于站长" href="http://felixfan.github.io/about/" target="_blank">ABOUT</a><a title="留言交流" href="http://felixfan.github.io/guestbook/" target="_blank">LEAVE MESSAGE</a><a title="种子订阅" href="http://felixfan.github.io/feed/" target="_blank">RSS</a></nav>
<article class="content">
<section class="meta"><time>2013-11-27</time> | category <a title="RStudy" href="http://felixfan.github.io/categories/#RStudy" target="_blank">RStudy</a>  | tag <a title="ggplot2" href="http://felixfan.github.io/tags/#ggplot2" target="_blank">ggplot2</a> </section>
<section class="post">
<h4 id="generate-data">Generate data</h4>
<div class="language-r highlighter-rouge">
<pre class="highlight"><code>library(ggplot2)
a &lt;- seq(1, 20)
b &lt;- a^0.25
df &lt;- as.data.frame(cbind(a, b))
</code></pre>
</div>
<h4 id="basic-plot">basic plot</h4>
<div class="language-r highlighter-rouge">
<pre class="highlight"><code>myplot = ggplot(df, aes(x = a, y = b)) + geom_point()
myplot
</code></pre>
</div>
<p><img src="http://felixfan.github.io/figure/ggplot-2-1.png" alt="plot of chunk ggplot-2-1" /></p>
<h4 id="themebw-will-get-rid-of-the-background">theme_bw() will get rid of the background</h4>
<div class="language-r highlighter-rouge">
<pre class="highlight"><code>myplot + theme_bw()
</code></pre>
</div>
<p><img src="http://felixfan.github.io/figure/ggplot-2-2.png" alt="plot of chunk ggplot-2-2" /></p>
<h4 id="remove-grid-does-not-remove-backgroud-colour-and-border-lines">remove grid (does not remove backgroud colour and border lines)</h4>
<div class="language-r highlighter-rouge">
<pre class="highlight"><code>myplot + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank())
</code></pre>
</div>
<p><img src="http://felixfan.github.io/figure/ggplot-2-3.png" alt="plot of chunk ggplot-2-3" /></p>
<h4 id="remove-border-lines-does-not-remove-backgroud-colour-and-grid-lines">remove border lines (does not remove backgroud colour and grid lines)</h4>
<div class="language-r highlighter-rouge">
<pre class="highlight"><code>myplot + theme(panel.border = element_blank())
</code></pre>
</div>
<p><img src="http://felixfan.github.io/figure/ggplot-2-4.png" alt="plot of chunk ggplot-2-4" /></p>
<h4 id="remove-background-remove-backgroud-colour-and-border-lines-but-does-not-remove-grid-lines">remove background (remove backgroud colour and border lines, but does not remove grid lines)</h4>
<div class="language-r highlighter-rouge">
<pre class="highlight"><code>myplot + theme(panel.background = element_blank())
</code></pre>
</div>
<p><img src="http://felixfan.github.io/figure/ggplot-2-5.png" alt="plot of chunk ggplot-2-5" /></p>
<h4 id="add-axis-line">add axis line</h4>
<div class="language-r highlighter-rouge">
<pre class="highlight"><code>myplot + theme(axis.line = element_line(colour = "black"))
</code></pre>
</div>
<p><img src="http://felixfan.github.io/figure/ggplot-2-6.png" alt="plot of chunk ggplot-2-6" /></p>
<h4 id="put-all-together---method-1">put all together &#8211; method 1</h4>
<div class="language-r highlighter-rouge">
<pre class="highlight"><code>myplot + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
</code></pre>
</div>
<p><img src="http://felixfan.github.io/figure/ggplot-2-7.png" alt="plot of chunk ggplot-2-8" /></p>
<h4 id="put-all-together---method-2">put all together &#8211; method 2</h4>
<div class="language-r highlighter-rouge">
<pre class="highlight"><code>myplot + theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))
</code></pre>
</div>
<p><img src="http://felixfan.github.io/figure/ggplot-2-8.png" alt="plot of chunk ggplot-2-9" /></p>
<h4 id="further-reading">Further reading</h4>
<p><a href="http://stackoverflow.com/questions/10861773/remove-grid-background-color-and-top-and-right-borders-from-ggplot2" target="_blank">remove grid, background color and top and right borders from ggplot2</a></p>
</section>
</article>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=1264</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Size Matters: Metabolic Rate and Longevity (Regression analysis sample)</title>
		<link>https://www.biofacebook.com/?p=1230</link>
		<comments>https://www.biofacebook.com/?p=1230#comments</comments>
		<pubDate>Wed, 30 Nov 2016 04:21:29 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[R语言]]></category>
		<category><![CDATA[兴趣杂项]]></category>
		<category><![CDATA[统计学习]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=1230</guid>
		<description><![CDATA[Size Matters: Metabolic Rate and Longevity <p>John Tukey once said, “The best thing about being a statistician is that you get to play in everyone’s backyard.” I enthusiastically agree!</p> <p>I frequently enjoy reading and watching science-related material. This invariably raises questions, involving other &#8220;backyards,&#8221; that I can better understand using statistics. For instance, see my [...]]]></description>
				<content:encoded><![CDATA[<h1>Size Matters: Metabolic Rate and Longevity</h1>
<p>John Tukey once said, “The best thing about being a statistician is that you get to play in everyone’s backyard.” I enthusiastically agree!</p>
<p>I frequently enjoy reading and watching science-related material. This invariably raises questions, involving other &#8220;backyards,&#8221; that I can better understand using statistics. For instance, see my post about the <a href="http://blog.minitab.com/blog/adventures-in-statistics/using-statistics-to-analyze-words" target="_blank">statistical analysis of dolphin sounds</a>.</p>
<p>The latest topic that grabbed my attention was an apparent error in the BBC program<em> Wonders of Life</em>. In the episode “Size Matters,” Professor Brian Cox presents a graph with a linear regression line that illustrates the relationship between the size of mammals and their metabolic rate.</p>
<h2>How Does the Size of Mammals Affect Their Lives?</h2>
<p>Brian Cox, a theoretical physicist, is a really smart guy and one of my favorite science presenters. So, I was surprised when his interpretation of the linear regression model seemed incorrect. Below is a closer look at the graph he presents, and his claim.</p>
<p><img src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/8a06bcb9b811c6ac9e57b4d30d243058/wonders_of_life_size_matters2a_w640.jpeg" alt="Wonder of Life plot of Mass and Energy" /></p>
<p>Cox points out the straight line and states, “That implies, gram-for-gram, large animals use less energy than small animals . . . because the slope is less than one.”</p>
<p>For linear regression, the slope being less than 1 is irrelevant. Instead, the fact that it is a straight line indicates that the same relationship applies for both small and large mammals. If you increase mass by 1 unit for a small mammal and for a large mammal, metabolism increases by the same average amount for both sizes. In other words, gram-for-gram, size<em> doesn’t</em> seem to matter!</p>
<p>However, it’s unlikely that Cox would make such a fundamental mistake, so I conducted some research. It turns out that biologists use a log-log plot to model the relationship between the mass of mammals and their basal metabolic rate.</p>
<p>Perhaps Cox actually presented a log-log plot but glossed over the details?</p>
<p>If so, this dramatically changes the interpretation of this graph, because log-log plots transform both axes in order to <a href="http://blog.minitab.com/blog/adventures-in-statistics/curve-fitting-with-linear-and-nonlinear-regression" target="_blank">model curvature</a> while using linear regression. If the slope on a log-log plot of metabolic rate by mass is between 0 and 1, it indicates that the nonlinear effect of mass on metabolic rate lessens as mass increases.</p>
<p>This description fits Cox’s statements about the slope and how mass effects metabolic rate.</p>
<p>Let’s test Cox’s hypothesis ourselves! Thanks to the <a href="http://esapubs.org/archive/ecol/E090/184/" target="_blank">PanTHERIA</a> database*, we can fit the same type of log-log plot using similar data.</p>
<h2>Log-Log Plot of Mammal Mass and Basal Metabolic Rate</h2>
<p>I’ll use the Fitted Line Plot in Minitab <a href="http://www.minitab.com/products/minitab/" target="_blank">statistical software</a> to fit a regression line to 572 mammals, ranging from the masked shrew (4.2 grams) to the common eland (562,000 grams). You can find the data <a href="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/479b4fbd-f8c0-4011-9409-f4109cc4c745/File/39e5d03977d902080f0b51c0948ec706/pantheria_mammals.MTW">here</a>.</p>
<p>In Minitab, go to <strong>Stat &gt; Regression &gt; Fitted Line Plot</strong>. Metabolic rate is our response variable and adult mass is our predictor. Go to <strong>Options</strong> and check all four boxes under <strong>Transformations</strong> to produce the log-log plot.</p>
<p><img src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/c9cb0235600b8f4012ee65b4358317cc/metabolism.gif" alt="Fitted line plot of mammal mass by metabolic rate" /></p>
<p>The slope (0.7063) is significant (p = 0.000) and its value is consistent with recently published estimates. Because the slope is between 0 and 1, it confirms Cox’s interpretation. Gram-for-gram, larger animals use less energy than smaller animals. In order to function, a cell in a larger animal requires less energy than a cell in a smaller animal.</p>
<p>I’m quite amazed that the <a href="http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit" target="_blank">R-squared</a> is 94.3%! Usually you only see R-squared values this high for a low-noise physical process. Instead, these data were collected by a variety of researchers in different settings and cover a huge range of mammals that live in completely different habitats.</p>
<p>So Cox presented the correct interpretation after all: for mammals, size matters. However, he presented an oversimplified version of the underlying analysis by not mentioning the double-log transformations. This is television, after all!</p>
<p>There are important implications based on the fact that this model is curved rather than linear. If the increase in metabolic rate remained constant as mass increased (the linear model), we’d have to eat 16,000 calories a day to sustain ourselves. Further, mammals wouldn’t be able to grow larger than a goat due to overheating!</p>
<h2>Basal Metabolic Rate and Longevity</h2>
<p>Cox also presented the idea that mammals with a slower metabolism live longer than those with a faster metabolism. However, he didn’t present data or graphs to support this contention. Fortunately, we can test this hypothesis as well.</p>
<p>In Minitab, I used <strong>Calc &gt; Calculator</strong> to divide metabolic rate by grams. This division allows us to make the gram-for-gram comparison. Time for another fitted line plot with a double-log transformation!</p>
<p><img src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/c265a71515c74cadd531767030c974c0/longevity.gif" alt="Fitted line plot of metabolic rate by longevity" /></p>
<p>The negative <a href="http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients" target="_blank">slope is significant</a> (0.000) and tells us that as metabolic rate per gram increases, longevity decreases. The R-squared is 45.8%, which is pretty good considering that we’re looking at just one of many factors than can impact maximum lifespan!</p>
<p>However, it is not a linear relationship because this is a log-log plot. Maximum longevity asymptotically approaches a minimum value around 13 months as metabolism increases. The graph below shows the curvilinear relationship using the natural scale.</p>
<p><img src="http://cdn2.content.compendiumblog.com/uploads/user/458939f4-fe08-4dbc-b271-efca0f5a2682/742d7708-efd3-492c-abff-6044d78e3bbd/Image/3ff5b9eccf81f045944d5a86dccf3c2d/fitted_longevity.gif" alt="Fitted line for metabolic rate and longevity" /></p>
<p>A one-unit increase in the slower metabolic rates (left side of x-axis) produces much larger drops in longevity than a on-unit increase in the faster metabolic rates (right side of x-axis).</p>
<p>Once again, we’ve shown that size does matter! Larger mammals tend to have a slower metabolism. And animals that have a slower metabolism tend to live longer. That’s fortunate for us because without our slower metabolism, we’d only live about a year!</p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=1230</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>数据分析之美：如何进行回归分析</title>
		<link>https://www.biofacebook.com/?p=1208</link>
		<comments>https://www.biofacebook.com/?p=1208#comments</comments>
		<pubDate>Thu, 30 Jun 2016 04:21:27 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[R语言]]></category>
		<category><![CDATA[统计学习]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=1208</guid>
		<description><![CDATA[1. 确定自变量与Y是否相关 证明：自变量X1，X2，&#8230;.XP中至少存在一个自变量与因变量Y相关 For any given value of n（观测数据的数目） and p（自变量X的数目）, any statistical software package can be used to compute the p-value associated with the F-statistic using this distribution. Based on this p-value, we can determine whether or not to reject H0. （用软件计算出的与F-statistic 相关的p-value来验证假设，the p-value associated with the F-statistic） 例子： Is there a relationship between [...]]]></description>
				<content:encoded><![CDATA[<h1>1. 确定自变量与Y是否相关</h1>
<div>证明：自变量X1，X2，&#8230;.XP中至少存在一个自变量与因变量Y相关</div>
<div>For any given value of n（观测数据的数目） and p（自变量X的数目）, any statistical software  package can be used to compute the p-value associated with the F-statistic using this distribution. Based on this p-value, we can determine whether or not to reject H0. （<strong>用软件计算出的与F-statistic 相关的p-value来验证假设，the p-value associated with the F-statistic</strong>）</div>
<div></div>
<div>例子：</div>
<div>
<div>Is there a relationship between advertising sales（销售额） and budget（广告预算：TV, radio, and newspaper）?</div>
<div>the p-value corresponding to the F-statistic in Table 3.6 is very low, indicating clear evidence of a relationship between advertising and sales.</div>
<p><img src="file:///C:/Users/IBM_ADMIN/AppData/Local/YNote/data/yangzhonglive@gmail.com/9049e4831aa14b4daf592485ff137632/clipboard.png" alt="" /></div>
<p><img src="http://img.blog.csdn.net/20150729220127950" alt="" /></p>
<p>背景知识回顾：</p>
<div>
<div><strong>t-statistic T统计量（t检验）与F-statistic</strong></div>
</div>
<div>t-statistic T统计量=（回归系数β的估计值-0）/β的标准误 ，which measures the number of standard deviations thatβis away from 0。用来对计量经济学模型中关于参数的单个假设进行检验的一种统计量。</div>
<div>我们一般用t统计量来检验回归系数是否为0做检验。例如：线性回归Y=β0+β1X，为了验证X与Y是否相关，</div>
<div>假设H0：X与Y无关,即β1=0</div>
<div>假设H1：X与Y相关,即β1不等于0</div>
<div>计算t-statistic， 如果t-statistic is far away from zero,则x和y相关。一般用p-values来检验X和Y是否相关。</p>
</div>
<div><strong>1）p-values（Probability，Pr）</strong></div>
<div>1 定义</div>
<div>pvalue的定义：在原假设正确的情况下，出现当前情况或者更加极端情况的概率。</div>
<div>p值是用来衡量统计显著性的常用指标。</div>
<div>P值( P-Value，Probability，Pr）即概率，反映某一事件发生的可能性大小。统计学根据显著性检验方法所得到的P 值，一般以P &lt; 0.05 为显著， P &lt;0.01 为非常显著，其含义是样本间的差异由抽样误差所致的概率小于0.05 或0.01。实际上，P 值不能赋予数据任何重要性，只能说明某事件发生的机率。</div>
<div></div>
<div>假设检验是推断统计中的一项重要内容。在假设检验中常见到P 值( P-Value，Probability，Pr)，P 值是进行检验决策的另一个依据。</div>
<div>大的pvalue说明还没有足够的证据拒绝原假设。</div>
<div></div>
<div><strong>2 为何有p-value</strong></div>
<div>P值方法的思路是先进行一项实验，然后观察实验结果是否符合随机结果的特征。研究人员首先提出一个他们想要推翻的“零假设”（null hypothesis），比如，两组数据没有相关性或两组数据没有显著差别。接下来，他们会故意唱反调，假设零假设是成立的，然后计算实际观察结果与零假设相吻合的概率。这个概率就是P值。费希尔说，P值越小，研究人员成功证明这个零假设不成立的可能性就越大。</div>
<div></div>
<div>其实理解起来很简单，基本原理只有两个：</div>
<div>1）一个命题只能证伪，不能证明为真</div>
<div>2）小概率事件不可能发生</div>
<div></div>
<div>证明逻辑就是：我要证明命题为真-&gt;证明该命题的否命题为假-&gt;在否命题的假设下，观察到小概率事件发生了-&gt;搞定。</div>
<div></div>
<div>3 demo</div>
<div>投飞镖，假设一个飞镖有10，9,8,7,6,5,4,3,2,1总共十个环（10是中心），定义合格投手为其真实水平能投到10~3环，而不管他临场表现如何。假设10~3环占靶子面积的95%。</div>
<div>H0：A是一个合格投手</div>
<div>H1：A不是合格投手</div>
<div></div>
<div>结合这个例子来看：证明A是合格的投手-》证明“A不是合格投手”的命题为假-》观察到一个事件（比如A连续10次投中10环），而这个事件在“A不是合格投手”的假设下，概率为p，小于0.05-&gt;小概率事件发生，否命题被推翻。</div>
<div></div>
<div>可以看到p越小-》这个事件越是小概率事件-》否命题越可能被推翻-》原命题越可信</div>
<p>&nbsp;</p>
<div><strong>2）F-statistic</strong></div>
<div>t检验是单个系数显著性的检验，检验一个变量X与Y是否相关，如电视上广告投入是否有利于销售额。T检验的原假设为某一解释变量的系数为0 。</div>
<div>F检验是是所有的自变量在一起对因变量的影响，当处理3个及其以上的时候（变量X1，X2，X3&#8230;等）用的是F检验。F检验的原假设为所有回归系数为0。</div>
<div>即F检验用于证明变量X1，X2，X3&#8230;中至少有一个变量和Y相关</div>
<div></div>
<div>F检验的原假设是H0：所有回归参数都等于0，所以F检验通过的话说明模型总体存在，F检验不通过，其他的检验就别做了，因为模型所有参数不显著异于0，相当于模型不存在（即没有任何一个变量X1，X2，X3&#8230; have no relationship with Y）。</div>
<h1><a name="t2"></a>2.确定有用的自变量子集</h1>
<div><strong>Do all the predictors help to explain Y , or is only a subset of the  predictors useful? （确定对Y有用的自变量）</strong></div>
<div></div>
<div>The first step in a multiple regression analysis is to compute the F-statistic and to examine the associated pvalue. If we conclude on the basis of that p-value that at least one of the predictors is related to the response, then it is natural to wonder which are the guilty ones!</p>
<p>The task of determining which predictors are associated with the response, in order to fit a single model involving only those predictors, is referred to as variable selection.</p>
<p>There are three classical approaches for this task:Forward selection.Forward selection.Forward selection.</p></div>
<div>1）Forward selection.</div>
<div>We begin with the null model—a model that conforward  selection null model tains an intercept but no predictors. We then fit p simple linear regressions</div>
<div>and add to the null model the variable that results in the  lowest RSS. We then add to that model the variable that results in the lowest RSS for the new two-variable model. This approach is  continued until some stopping rule is satisfied.</div>
<div></div>
<div>2)Backward selection.</div>
<div>We start with all variables in the model, and  backward remove the variable with the largest p-value—that is, the variable selection that is the least statistically significant. The new (p − 1)-variable model is fit, and the variable with the largest p-value is removed. This procedure continues until a stopping rule is reached. For instance, we may stop when all remaining variables have a p-value below some threshold.</div>
<div></div>
<div>3)Mixed selection.</div>
<div>This is a combination of forward and backward semixed  lection. We start with no variables in the model, and as with forward selection , we add the variable that provides the best fit. We continue to add variables one-by-one. Of course, as we noted with the Advertising example, the p-values for variables can become larger as new predictors are added to the model. Hence, if at any point the p-value for one of the variables in the model rises above a certain threshold, then we remove that variable from the model. We continue to perform these forward and backward steps until all variables in the model have a sufficiently low p-value, and all variables outside the model would have a large p-value if added to the model.</div>
<div></div>
<div>
<div>Compare:</div>
<div>Backward selection requires that the number of samples n is larger than  the number of variables p (so that the full model can be fit). In contrast, forward stepwise can be used even when n &lt; p, and so is the only viable subset method when p is very large.</div>
<div></div>
<div>How to  selecting the best model among a collection of models with different numbers of predictors?</div>
<div>
<div>Instead, we wish to choose a model with a low test error. As is evident here, the training error can be a poor estimate of the test error. Therefore, RSS and R2 are not suitable for selecting the best model among a collection of models with different numbers of predictors.</div>
<div></div>
<div>These approaches can be used to select among a set of models with different numbers of variables.</div>
<div></div>
<div>Cp, Akaike information criterion (AIC), Bayesian information criterion (BIC), and adjusted R2</div>
<div></div>
<div>In the past, performing cross-validation was computationally prohibitive  for many problems with large p and/or large n, and so AIC, BIC, Cp, and adjusted R2 were more attractive approaches for choosing among a set of models. However, nowadays with fast computers, the computations required to perform cross-validation are hardly ever an issue. Thus, crossvalidation is a very attractive approach for selecting from among a number of models under consideration.</div>
<div></div>
<div>TODO chapter6 &lt;Linear Model Selection and Regularization&gt;</div>
</div>
</div>
<p>&nbsp;</p>
<h1><a name="t3"></a>3.模型误差（RSE,R^2）</h1>
<div><strong>How well does the model fit the data?</strong></div>
<div></div>
<div>An R2 value close to 1 indicates that the model explains a large portion of the variance（自变量X） in the response variable（因变量Y）.<br />
It turns out that R2 will always increase when more variables are added to the model, even if those variables are only weakly associated with the response.</div>
<div></div>
<div>例子：</div>
<div>For the Advertising data, the RSE is 1,681units while the mean value for the response is 14,022, indicating a percentage error of roughly 12 %(RSE/mean value). Second, the R2 statistic records the percentage of variability in the response that is explained by the predictors. The predictors explain almost 90 % of the variance in sales.</div>
<div><img src="http://img.blog.csdn.net/20150729220202172" alt="" /></div>
<div></div>
<div></div>
<div>背景知识：</div>
<div>RSE标准差</div>
<div><img src="http://img.blog.csdn.net/20150729220240833" alt="" /></div>
<div></div>
<div>
<div>R2 Statistic（R-square）用于评判一个模型拟合好坏的重要标准</div>
<div>R平方介于0~1之间，越接近1，回归拟合效果越好，模型越精确。</div>
<div></div>
<div>R^2判定系数就是拟合优度判定系数，它体现了回归模型中自变量Y的变异在因变量X的变异中所占的比例。即用来表示y值中有多少可以用x值来解释（R2 measures the proportion<br />
of variability in Y that can be explained using X.），0.92的意思就是y值中有92%可以用x值来解释。</div>
<div>当R^2=1时表示，所有观测点都落在拟合的直线或曲线上；当R^2=0时，表示自变量与因变量不存在直线或曲线关系。</div>
<div></div>
<div><img src="file:///C:/Users/IBM_ADMIN/AppData/Local/YNote/data/yangzhonglive@gmail.com/b2ad5452a9e740c48da36b0d1f0139e2/clipboard.png" alt="" /></div>
<div>如何根据R-squared判断模型是否准确？</div>
<div>However, it can  still be challenging to determine what is a good R2 value, and in general,  this will depend on the application. For instance, in certain problems in</div>
<div>physics, we may know that the data truly comes from a linear model with  a small residual error. In this case, we would expect to see an R2 value that is extremely close to 1, and a substantially smaller R2 value might indicate a serious problem with the experiment in which the data were generated. On the other hand, in typical applications in biology, psychology, marketing, and other domains, the linear model (3.5) is at best an extremely rough approximation to the data, and residual errors due to other unmeasured factors are often very large. In this setting, we would expect only a very small proportion of the variance in the response to be explained by the</div>
<div>predictor, and an R2 value well below 0.1 might be more realistic!</div>
</div>
<h1><a name="t4"></a>4.应用模型：Y准确度（置信度，置信区间，预测区间）</h1>
<div>Given a set of predictor values, what response value should we predict, and how accurate is our prediction?</div>
<div>Once we have fit the multiple regression model, it is straightforward to apply  in order to predict the response Y on the basis of a set of values for the predictors X1, X2, . . . , Xp.</p>
</div>
<div>We can compute a confidence interval （置信区间）in order to determine how close Y'(用模型计算出的值) will be to f(X)（理论中的真实值）.</div>
<div>predict an individual response use prediction interval, predict the average response use confidence interval.</div>
<div></div>
<div><img src="http://img.blog.csdn.net/20150729220233372" alt="" /></div>
<div><strong>confidence interval置信区间 与预测区间</strong></div>
<div></div>
<div>1 置信区间</div>
<div>表示在给定预测变量的指定设置时，平均响应可能落入的范围。</div>
<div>置信区间是结合置信度来说的，简单来说就是随机变量有一定概率落在一个范围内，这个概率就叫置信度，范围就是对应的置信区间。</div>
<div>真实数据往往是实际上不能获知的，我们只能进行估计，估计的结果是给出一对数据，比如从1到1.5，真实的值落在1到1.5之间的可能性是95%（也有5%的可能性在这区间之外的）。</div>
<div>90%置信区间（Confidence Interval,CI）：当给出某个估计值的90%置信区间为【a,b】时，可以理解为我们有90%的信心（Confidence）可以说样本的平均值介于a到b之间，而发生错误的概率为10%。</div>
<div>2 预测区间Prediction Interval</div>
<div>表示在给定预测变量的指定设置时，单个观测值可能落入的范围。</div>
<div>预测区间PI总是要比对应的置信区间CI大，这是因为在对单个响应与响应均值的预测中包括了更多的不确定性。</div>
<div></div>
<div></div>
<div>The basic syntax is lm(y∼x,data), where y is the response（预测值）, x is the predictor（影响因子：x1,x2）, and data is the data set in which these two variables are kept.</div>
<h1><a name="t5"></a>5.模型修正</h1>
<h2><a name="t6"></a>1）各自变量X1，X2&#8230;对因变量Y的影响 程度</h2>
<div>Which media contribute to sales?</div>
<div>To answer this question, we can examine the p-values associated with  each predictor’s t-statistic. In the multiple linear regression displayed in Table 3.4, the p-values for TV and radio are low,but the p-value for newspaper is not. This suggests that only TV and radio are related to sales.</div>
<p><img src="http://img.blog.csdn.net/20150729220259814" alt="" /></p>
<h2><a name="t7"></a>2）解决共线性问题</h2>
<div>所谓多重共线性（Multicollinearity）是指线性回归模型中的解释<a href="http://baike.baidu.com/view/296689.htm" target="_blank">变量</a>之间由于存在精确<a href="http://baike.baidu.com/view/510626.htm" target="_blank">相关关系</a>或高度相关关系而使模型估计失真或难以估计准确。一般来说，由于经济数据的限制使得模型设计不当，导致设计<a href="http://baike.baidu.com/view/10337.htm" target="_blank">矩阵</a>中解释变量间存在普遍的相关关系。</div>
<div>如何解决共线性问题？</div>
<div>方差膨胀因子（Variance Inflation Factor，VIF）：容忍度的倒数，VIF越大，显示<a href="http://baike.baidu.com/view/222502.htm" target="_blank">共线性</a>越严重。经验判断方法表明：当0＜VIF＜10，不存在<a href="http://baike.baidu.com/view/1334455.htm" target="_blank">多重共线性</a>；当10≤VIF＜100，存在较强的多重共线性；当VIF≥100，存在严重多重共线性。</div>
<div></div>
<div>
<h2><a name="t8"></a>3）交互项系数（interaction terms）</h2>
<div>衡量的是一个变量对于“另一个变量对因变量影响能力”的影响。</div>
<div>Is there synergy among the advertising media?</div>
<div>Perhaps spending $50,000 on television advertising and $50,000 on  radio advertising results in more sales than allocating $100,000 to either television or radio individually. In marketing, this is known as</div>
<div>a synergy effect, while in statistics it is called an interaction effect.</div>
<div></div>
<div>何时适合在模型中加入交互系数？</div>
<div><img src="http://img.blog.csdn.net/20150729220403415" alt="" /></div>
</div>
<h2><a name="t9"></a>4）异常值outlier检测</h2>
<div>Residual plots（残差散点图） can be used to identify outliers. 检测到异常值后，从数据中去掉异常值，再生成纠正后的模型。<br />
残差是指观测值与预测值（拟合值）之间的差，即是实际观察值与回归估计值的差。 残差分析就是通过残差所提供的信息，分析出数据的可靠性、周期性或其它干扰。在线性回归中,残差的重要应用之一是根据它的绝对值大小判定异常点。</div>
<div>But in practice, it can be difficult to decide how large a residual needs to be before we consider the point to be an outlier. To address this problem, instead of plotting the residuals, we can plot the studentized residuals, computed by dividing each residual ei by its estimated standard studentized error. Observations whose studentized residuals are greater than 3 in abso- residuallute value are possible outliers.</div>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=1208</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments</title>
		<link>https://www.biofacebook.com/?p=912</link>
		<comments>https://www.biofacebook.com/?p=912#comments</comments>
		<pubDate>Thu, 08 May 2014 07:35:43 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[R语言]]></category>
		<category><![CDATA[proteomic]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=912</guid>
		<description><![CDATA[<p>MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments.</p> ]]></description>
				<content:encoded><![CDATA[<p><a href="http://bioinformatics.oxfordjournals.org/content/early/2014/05/02/bioinformatics.btu305.abstract.html?papetoc">MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments</a>.</p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=912</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment</title>
		<link>https://www.biofacebook.com/?p=909</link>
		<comments>https://www.biofacebook.com/?p=909#comments</comments>
		<pubDate>Thu, 08 May 2014 07:33:41 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[R语言]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=909</guid>
		<description><![CDATA[<p>R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment.</p> ]]></description>
				<content:encoded><![CDATA[<p><a href='http://bioinformatics.oxfordjournals.org/content/early/2014/05/06/bioinformatics.btu197.abstract.html?papetoc'>R PheWAS: data analysis and plotting tools for phenome-wide association studies in the R environment</a>.</p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=909</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R pheatmap</title>
		<link>https://www.biofacebook.com/?p=867</link>
		<comments>https://www.biofacebook.com/?p=867#comments</comments>
		<pubDate>Wed, 12 Feb 2014 04:04:41 +0000</pubDate>
		<dc:creator><![CDATA[szypanther]]></dc:creator>
				<category><![CDATA[My project]]></category>
		<category><![CDATA[R语言]]></category>

		<guid isPermaLink="false">http://www.biofacebook.com/?p=867</guid>
		<description><![CDATA[<p>&#62; library(caTools); &#62; library(bitops); &#62; library(grid); &#62; data=read.csv(&#8220;/home/shenzy/Desktop/R/Bac.heatmap1.csv&#8221;) &#62; data=read.csv(&#8220;/home/shenzy/Desktop/R/Bac.heatmap1.2.csv&#8221;) &#62; View(data) &#62; data=read.csv(&#8220;/home/shenzy/Desktop/R/Bac.heatmap1.2.csv&#8221;,sep=&#8221;\t&#8221;) &#62; View(data) &#62; row.names(data) &#60;- data$X.OTU.ID; &#62; View(data) &#62; data_matrix&#60;-data[,2:15] &#62; View(data_matrix) &#62; data_matrix&#60;-data[,2:14] &#62; View(data_matrix) &#62; View(data) &#62; data_matrix&#60;-data[,1:14] &#62; View(data_matrix) &#62; library(pheatmap) &#62; data_matrix[is.na(data_matrix)]&#60;-1 &#62; View(data_matrix) &#62; data_log10&#60;-log10(data_matrix) &#62; View(data_log10) &#62; data_log2&#60;-log2(data_matrix) &#62; View(data_log2) &#62; pheatmap(data_log2,fontsize=9, fontsize_row=6) &#62; pheatmap(data_log2, [...]]]></description>
				<content:encoded><![CDATA[<p>&gt; library(caTools);<br />
&gt; library(bitops);<br />
&gt; library(grid);<br />
&gt; data=read.csv(&#8220;/home/shenzy/Desktop/R/Bac.heatmap1.csv&#8221;)<br />
&gt; data=read.csv(&#8220;/home/shenzy/Desktop/R/Bac.heatmap1.2.csv&#8221;)<br />
&gt; View(data)<br />
&gt; data=read.csv(&#8220;/home/shenzy/Desktop/R/Bac.heatmap1.2.csv&#8221;,sep=&#8221;\t&#8221;)<br />
&gt; View(data)<br />
&gt; row.names(data) &lt;- data$X.OTU.ID;<br />
&gt; View(data)<br />
&gt; data_matrix&lt;-data[,2:15]<br />
&gt; View(data_matrix)<br />
&gt; data_matrix&lt;-data[,2:14]<br />
&gt; View(data_matrix)<br />
&gt; View(data)<br />
&gt; data_matrix&lt;-data[,1:14]<br />
&gt; View(data_matrix)<br />
&gt; library(pheatmap)<br />
&gt; data_matrix[is.na(data_matrix)]&lt;-1<br />
&gt; View(data_matrix)<br />
&gt; data_log10&lt;-log10(data_matrix)<br />
&gt; View(data_log10)<br />
&gt; data_log2&lt;-log2(data_matrix)<br />
&gt; View(data_log2)<br />
&gt; pheatmap(data_log2,fontsize=9, fontsize_row=6)<br />
&gt; pheatmap(data_log2, color = colorRampPalette(c(&#8220;white&#8221;, &#8220;yellow&#8221;, &#8220;firebrick3&#8243;))(50), fontsize=9, fontsize_row=6)<br />
&gt; pheatmap(data_log2, color = colorRampPalette(c(&#8220;white&#8221;, &#8220;blue&#8221;, &#8220;firebrick3&#8243;))(50), fontsize=9, fontsize_row=6)<br />
&gt; help(pheatmap)<br />
&gt; pheatmap(data_log2, color = colorRampPalette(c(&#8220;white&#8221;, &#8220;blue&#8221;, &#8220;firebrick3&#8243;))(50), fontsize=9, fontsize_row=6,cluster_rows=FALSE, cluster_cols=FALSE)<br />
&gt; pheatmap(data_log2, color = colorRampPalette(c(&#8220;white&#8221;, &#8220;blue&#8221;, &#8220;firebrick3&#8243;))(50), fontsize=9, fontsize_row=6)<br />
&gt; data2=read.csv(/home/shenzy/Desktop/R/Bac.heatmap1.3_GZ.csv)<br />
Error: unexpected &#8216;/&#8217; in &#8220;data2=read.csv(/&#8221;<br />
&gt; data2=read.csv(&#8220;/home/shenzy/Desktop/R/Bac.heatmap1.3_GZ.csv&#8221;,sep=&#8221;\t&#8221;)<br />
&gt; View(data2)<br />
&gt; row.names(data2) &lt;- data2$X.OTU.ID;<br />
&gt; data3=read.csv(&#8220;/home/shenzy/Desktop/R/Bac.heatmap1.4_SH.csv&#8221;,sep=&#8221;\t&#8221;)<br />
&gt; row.names(data3) &lt;- data3$X.OTU.ID;<br />
&gt; View(data3)<br />
&gt; data2_matrix&lt;-data2[,1:7]<br />
&gt; View(data2_matrix)<br />
&gt; data3_matrix&lt;-data3[,1:7]<br />
&gt; View(data3_matrix)<br />
&gt; data2_matrix[is.na(data2_matrix)]&lt;-1<br />
&gt; data3_matrix[is.na(data3_matrix)]&lt;-1<br />
&gt; data2_log2&lt;-log2(data2_matrix)<br />
&gt; data3_log2&lt;-log2(data3_matrix)<br />
&gt; pheatmap(data2_log2, color = colorRampPalette(c(&#8220;white&#8221;, &#8220;blue&#8221;, &#8220;firebrick3&#8243;))(50), fontsize=9, fontsize_row=6,cluster_rows=FALSE, cluster_cols=FALSE)<br />
&gt; pheatmap(data2_log2, color = colorRampPalette(c(&#8220;white&#8221;, &#8220;blue&#8221;, &#8220;firebrick3&#8243;))(50), fontsize=9, fontsize_row=6)<br />
&gt; pheatmap(data3_log2, color = colorRampPalette(c(&#8220;white&#8221;, &#8220;blue&#8221;, &#8220;firebrick3&#8243;))(50), fontsize=9, fontsize_row=6,cluster_rows=FALSE, cluster_cols=FALSE)<br />
&gt; pheatmap(data3_log2, color = colorRampPalette(c(&#8220;white&#8221;, &#8220;blue&#8221;, &#8220;firebrick3&#8243;))(50), fontsize=9, fontsize_row=6)</p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2014/02/BACTERIA_1.21.png"><img class="aligncenter size-full wp-image-869" title="BACTERIA_1.2" src="http://www.biofacebook.com/wp-content/uploads/2014/02/BACTERIA_1.21.png" alt="" width="550" height="450" /></a></p>
<p>&nbsp;</p>
<p><a href="http://www.biofacebook.com/wp-content/uploads/2014/02/BACTERIA_1.2cluster.png"><img class="aligncenter size-full wp-image-870" title="BACTERIA_1.2cluster" src="http://www.biofacebook.com/wp-content/uploads/2014/02/BACTERIA_1.2cluster.png" alt="" width="550" height="450" /></a></p>
]]></content:encoded>
			<wfw:commentRss>https://www.biofacebook.com/?feed=rss2&#038;p=867</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
