%%% R-courseEH.Rnw
%% Author: emanuelheitlinger@gmail.com
\documentclass[12pt,a4paper]{article}
\usepackage[debugshow,final]{graphics}
\usepackage{float}
\usepackage{lscape}
\begin{document}
\title{Transcript of Mick Crawley's R course 2010
Imperial College London, Silwood Park}
\author{Emanuel G Heitlinger}
\date{}
\maketitle
\setkeys{Gin}{width=\linewidth}
Disclaimer:
The following document is a private transcript of Mick Crawley's R-course. I am a participant in this course and my writeup has in no way been approved by Mick Crawly (from whom the ideas behind the code and teaching concepts are) or any of his staff.
\section*{Interlude: The perfect error bar}
\subsection*{What should error bars tell us?}
\textbf{They should be overlapping if there is no significant difference between the means, non-overlapping if there is significant difference between the means.}
\subsection*{What bars are used and what is wrong about it?}
In a t-test (assuming normal errors and a equal variance of the two samples) we are using the difference between the two means over the the standard error of the difference:\\
$\displaystyle\frac{\bar{x}-\bar{y}}{SE_{difference}}$\\
Or more verbosely:\\
$\displaystyle\frac{\bar{x}-\bar{y}}{\sqrt{\frac{s^2_{A}}{n_{A}}+\frac{s^2_{B}}{n_{B}}}}$\\
This value (the test statistic) is compared to tables, if it is bigger than the value from tables, we accept the \textbf{alternate hypothesis}: significant difference.\\
\subsubsection*{Bars of length one standard error (1 s.e.)}
Remember, that the standard error of the mean is:\\
$SE_{\bar{y}}=\displaystyle\sqrt{\frac{s^2}{n}}$\\
Error bars derived from standard error do not incorporate the value from tables. Instead just 2 individual standard errors are used. 2 is often close to the value from the table, but it is multplied with the wrong quantity: The satandard error of one sample beeing always lower than the standard error of the difference (about 1.4 times) .
Error bars derived from the former can therefore be too short. They could fail indicating non-significance as they could be non-overlapping and still the means non-significantly different.
<>=
x <- c(1,2,3,3,2,3,4,5,3,2,5,4,4,6)
y <- x+1
barplot(c(X=mean(x), Y=mean(y)), ylim=c(0, 6),
main="means with error bars of 1 s.e.")
lines(c(0.7,0.7), c(mean(x)-sqrt(var(x)/length(x)),
mean(x)+sqrt(var(x)/length(x))), lwd=3)
lines(c(1.9,1.9), c(mean(y)-sqrt(var(y)/length(y)),
mean(y)+sqrt(var(y)/length(y))),lwd=3)
@
Here are three means, which have non-overlapping standard error bars.
Are they significantly different? \textbf{No!}
<<>>=
t.test(x, y, var.equal=TRUE)
@
\subsubsection*{Bars of the length of a confidence interval}
These are given as follows:\\
$CI_{95\%} = t_{(\alpha= , d.f. )}\displaystyle\sqrt{\frac{s^2}{n}}$\\
To generate error bars based on confidence intervals the t-value (from the table) is multiplied with each one of the standard errors. The value from table enters the comparison twice here and inflates the error bars. The fact that we are still using the wrong standard error does't save us. It is about 1.4 times smaller then the correct one (of the difference). The one t (from table) too much is always bigger than 1.4, still inflating the the bars too much.
<>=
w <- x+1.2
barplot(c(X=mean(x), W=mean(w)), ylim=c(0, 7),
main="Error bars of 95% confidence intervals")
lines(c(0.7,0.7), c(mean(x)- qt(0.975, length(x)-1)*sqrt(var(x)/length(x)),
mean(x)+ qt(0.975, length(x)-1)*sqrt(var(x)/length(x))),
lwd=3)
lines(c(1.9,1.9), c(mean(w)- qt(0.975, length(w)-1)*sqrt(var(w)/length(w)),
mean(w)+ qt(0.975, length(w)-1)*sqrt(var(w)/length(w))),
lwd=3)
@
Here are three other means, which have overlapping 95\% confidence interval bars. They are overlapping, but here is the t-test: \textbf{significant!}
<<>>=
t.test(x, w, var.equal=TRUE)
@
\subsection*{The perfect error bar}
The error bar should have the length of half the \textbf{Least Significant Difference (LSD)}\\
The last significant difference is found inserting in the formula for the t-test statistic.\\
$t_{(\alpha, d.f)}=\displaystyle\frac{LSD}{SE_{difference}}$\\
$LSD=t_{(\alpha, d.f)}SE_{difference}$\\
or more verbosely\\
$LSD=t_{(\alpha, d.f)}\displaystyle\frac{\bar{x}-\bar{y}}{\sqrt{\frac{s^2_{A}}{n_{A}}+\frac{s^2_{B}}{n_{B}}}}$\\
<>=
LSDline <- function(x, y, loc){
lines(c(loc,loc), c(mean(x)-
2/qt(0.975, length(c(x,y))-2)
*sqrt(var(x)/length(x)+var(y)/length(y)),
mean(x)+
2/qt(0.975, length(c(x,y))-2)
*sqrt(var(x)/length(x)+var(y)/length(y))),
lwd=3)
}
par(mfrow=c(1,2))
barplot(c(X=mean(x), Y=mean(y)), main="error bars of length 1/2 LSD", ylim=c(0,6))
LSDline(x, y, 0.7)
LSDline(y, x, 1.9)
barplot(c(X=mean(x), W=mean(w)), main="error bars of length 1/2 LSD", ylim=c(0,6))
LSDline(x, w, 0.7)
LSDline(w, x, 1.9)
@
Remember the t-test's results: \textbf{Bingo!}
But what if you want to compare more than two means? What if assumptions of the t-test are violated?
\textbf{Best do not use barplots at all: Much ink, low information content.} Boxplots are an alternative, they even are non-parametric!
\end{document}