%%% R-courseEH.Rnw
%% Author: emanuelheitlinger@gmail.com
\documentclass[12pt,a4paper]{article}
\usepackage[debugshow,final]{graphics}
\usepackage{float}
\usepackage{lscape}
\begin{document}
\title{Transcript of Mick Crawley's R course 2010
Imperial College London, Silwood Park}
\author{Emanuel G Heitlinger}
\date{}
\maketitle
\setkeys{Gin}{width=\linewidth}
Disclaimer:
The following document is a private transcript of Mick Crawley's R-course. I am a participant in this course and my writeup has in no way been approved by Mick Crawley (from whom the ideas behind the code and teaching concepts are) or any of his staff.
\section*{Day 5: ANOVA 2}
\section*{Two way factorial Anova}
If a experiment is replicated within each of its experimental plots we can use factorial Anova.
<<>>=
fa <- read.table("factorial.txt", header=TRUE)
@
To calculate the usual SSA and SSB (sums of squares for the two factors) we need our function from non-factorial Anova again. Remember:\\
$SSA=\displaystyle\frac{\sum{C^2}}{n}-\frac{(\sum{y})^2}{kn}$\\
<<>>=
CF <- function(x) sum(x)^2/length(x)
SSA.aov <- function(x, fac){
sumCs <- sum(as.vector((tapply(x, fac, sum)^2)/tapply(x, fac, length)))
sumCs-CF(x)
}
ssa <- SSA.aov(fa$growth, fa$diet)
ssb <- SSA.aov(fa$growth, fa$coat)
@
Factorial Anova involves additionally looking at the \textbf{interaction sum of squares (SSAB)}.\\
$SSAB=\displaystyle\frac{\sum{Q^2}}{n}-SSA-SSB-CF$\\
Where $CF$ is as usual the correction factor\\
$CF=\displaystyle\frac{(\sum{y})^2}{n}$\\
and $Q/n$ is the total of each factor combination over the numbers in this combination. We can calculate this using the combied factor levels of two factors.
<<>>=
Q.aov <- function(x, fac){
sum(as.vector((tapply(x, fac, sum)^2)/tapply(x, fac, length)))
}
ssab <- Q.aov(fa$growth, fa$diet:fa$coat)-ssa-ssb-CF(fa$growth)
sst <- sum(fa$growth^2)-sum(fa$growth)^2/length(fa$growth)
c(DIET=ssa, COAT=ssb, INTERACTION=ssab, residuals.eror=sst-ssa-ssb-ssab)
@
I leave the proper table out now, as it should be clear how to go from the sums of squares with the degrees of freedoms to the probablilities via variances (mean squares) and F statistics.
<<>>=
fa.mod <- aov(fa$growth~fa$diet*fa$coat)
summary(fa.mod)
@
This is a first example of model simplification. As the interaction-term of the two factors seems to be non-significant we try a model witout it:
<<>>=
fa.mod2 <- update(fa.mod, ~. - fa$diet:fa$coat)
summary(fa.mod2)
@
We have to use an Anova on the two models to comopare wether the first explaines the data significantly better.
<<>>=
anova(fa.mod, fa.mod2)
@
The simpler model is not significantly different from the complicated one and we have to think about further simplification from the second model.
<<>>=
fa.mod3 <- update(fa.mod2, ~. -fa$diet)
summary(fa.mod3)
anova(fa.mod2, fa.mod3)
@
The model 3 is not significantly worse than 2 and we are left with no significance, but there was some effect of coat when diet was included. We need to take a closer look:
<<>>=
tapply(fa$growth, fa$diet, mean)
@
Diet A and B seem to produce similar responses and can therefore be included in the model as a single factor level. Try modelling with this new two level factor.
<<>>=
new.fac <- as.factor(ifelse(fa$diet=="C", "C", "AB"))
fa.mod4 <- update(fa.mod3, ~. +new.fac)
summary(fa.mod4)
anova(fa.mod4, fa.mod3)
@
The new model is significantly better than the old one, because reducing the factor levels to 2 saved us a degree of freedom. In fact this is a first exemple of contrasts, we will meet them in another practical im more detail.
Is this enough, or do we also have to include the interaction term of diet and the new factor?
<<>>=
fa.mod5 <- update(fa.mod4, ~. +fa$coat:new.fac)
summary(fa.mod5)
anova(fa.mod4, fa.mod5)
@
No, the interaction is not significant and model 5 is not better than 4. Model 4 (fa.mod4) is our minimal adequate model.
\subsection*{Three way factorial Anova}
We load a dataset, that has three factors and repeated measurements for factor combinations.
<<>>=
dada <- read.table("daphnia.txt", header=TRUE)
attach(dada)
@
The simple sums of squares and the interaction sums of squares are calculated exactly as above. The new quantitie to calculate is the \textbf{three way interaction sum of squares(SSABC)}\\
$SSABC=\displaystyle\frac{\sum{T^2}}{n}-SSA-SSB-SSC-SSAB-SSAC-AABC-CF$\\
We can calculate all these quantities using functions we already have:
<<>>=
sst <- sum(Growth.rate^2)-sum(Growth.rate)^2/length(Growth.rate)
ssa <- SSA.aov(Growth.rate, Water)
ssb <- SSA.aov(Growth.rate, Detergent)
ssc <- SSA.aov(Growth.rate, Daphnia)
ssab <- Q.aov(Growth.rate, Water:Detergent)-ssa-ssb-CF(Growth.rate)
ssac <- Q.aov(Growth.rate, Water:Daphnia)-ssa-ssc-CF(Growth.rate)
ssbc <- Q.aov(Growth.rate, Detergent:Daphnia)-ssb-ssc-CF(Growth.rate)
ssabc <- Q.aov(Growth.rate, Water:Detergent:Daphnia)-ssa-ssb-ssc-ssab-ssac-ssbc-CF(Growth.rate)
sse <- sst-ssa-ssb-ssc-ssab-ssac-ssbc-ssabc
round(c(SST=sst, SSA=ssa, SSB=ssb, SSC=ssc, SSAB=ssab, SSAC=ssac, SSBC=ssbc, SSABC=ssabc, SSE=sse), 3)
@
Compare this sloppy presentation of the different sums of squares to these from the model:
<<>>=
fac <- aov(Growth.rate~Water*Detergent*Daphnia)
summary(fac)
@
The three way interaction can be removed and one of the two way interactions:
<<>>=
fac2 <- update(fac, ~. -Water:Detergent:Daphnia)
summary(fac2)
anova(fac, fac2)
fac3 <- update(fac2, ~. -Water:Detergent)
summary(fac3)
anova(fac2, fac3)
@
To get a instant view on the means we can use
<<>>=
model.tables(fac3)
@
To illustrate we can use plot.design and interaction.plot
<>=
plot.design(dada)
@
<>=
interaction.plot(Detergent, Daphnia, Growth.rate)
@
\end{document}