why isn't fit <- eBayes(fit); topTable(fit, coef=4)
the same as fit <- contrasts.fit(fit, c(-1,0,0,1)); fit <- eBayes(fit); topTable(fit)
(column 1 of the design beeing the intercept)?
Strain <- factor(targets$Strain, levels=c("WT","Mu"))
Treatment <- factor(targets$Treatment, levels=c("U","S"))
design <- model.matrix(~Strain+Strain:Treatment)
colnames(design)
[1] "(Intercept)" "StrainMu" "StrainWT:TreatmentS" "StrainMu:TreatmentS"
The first term in the model formula is an effect for Strain. This introduces an intercept column to the design matrix, which estimates the average log-expression level for wild-type unstimulated cells, and a column for Strain which estimates the mutant vs wildtype dierence in the unstimulated state. The second term in the model formula represents the interaction between stimulation and strain. [...] It introduces a third and a fourth column to the design matrix which represent the effect of stimulation for wild-type and for mutant mice respectively [...].
fit <- lmFit(eset, design)
fit <- eBayes(fit)
topTable(fit, coef=3)
# will find those genes responding to stimulation in wild-type mice, and
topTable(fit, coef=4)
# will find those genes responding to stimulation in mutant mice
If using coef
is the same as looking at the difference between the 4th column of the design matrix and the intercept (i.e. the contrast between the fourth and first column), wouldn't we need to look at the contrast between the fourth and second column to get the genes responding to stimulation in mutant mice?
Of course I compared the results when using coef
and when using contrasts. They differ but I do not understand why... Obviously it means that coef=4
does not mean "look at the difference between column 4 and the intercept", but what does it mean then?
I hope that the question is understandable. Many thanks in advance!
The design matrix is based on
targets <- data.frame(
Strain = factor(c("WT", "WT", "MU", "MU", "MU"), levels = c("WT", "MU")),
Treatment = factor(c("U", "S", "U", "S", "S"), levels = c("U", "S")))
design <- model.matrix(~ Strain + Strain:Treatment, data = targets)
> targets
## Strain Treatment
## 1 WT U
## 2 WT S
## 3 MU U
## 4 MU S
## 5 MU S
Each row of targets
corresponds to an experimental sample. The design matrix looks like this:
## (Intercept) StrainMU StrainWT:TreatmentS StrainMU:TreatmentS
## 1 1 0 0 0
## 2 1 0 1 0
## 3 1 1 0 0
## 4 1 1 0 1
## 5 1 1 0 1
Again, each row corresponds to an experimental sample. The columns of design
correspond to coefficients that are fitted by limma
and you can read off what combination of coefficients gives the model-fitted value for a given experimental group by comparing the rows of design
with those of targets
.
Looking at coef=4
effectively means that you're testing the null hypothesis that the fourth coefficient (that for StrainMu:TreatmentS
) is zero - it isn't the same as comparing the value of the fourth coefficient against the value of the intercept coefficient.
Think in terms of the fitted value for each of the experimental classes. For a given gene,
if I was a wildtype, unstimulated mouse (as in the first row of design
or targets
), my fitted value would be:
Intercept
if I was a wildtype, stimulated mouse (second row), my fitted value would be:
Intercept + StrainWT:TreatmentS
if I was a mutant, unstimulated mouse (third row), my fitted value would be:
Intercept + StrainMu
if I was a mutant, stimulated mouse (fourth and fifth rows), my fitted value would be:
Intercept + StrainMu + StrainMU:TreatmentS
So the difference between the stimulated and unstimulated group within the mutant strain is:
(Intercept + StrainMU + StrainMU:TreatmentS) - (Intercept + StrainMU)
= StrainMU:TreatmentS
... the coefficient corresponding to the 4th column in the design matrix
Hopefully that was helpful
ps, using coef=4
should give you the same result as using contrast = c(0, 0, 0, 1)