rlimma

Meaning of coef in limma


tl;dr

why isn't fit <- eBayes(fit); topTable(fit, coef=4) the same as fit <- contrasts.fit(fit, c(-1,0,0,1)); fit <- eBayes(fit); topTable(fit) (column 1 of the design beeing the intercept)?

Example from the limma usersguide

Strain <- factor(targets$Strain, levels=c("WT","Mu"))
Treatment <- factor(targets$Treatment, levels=c("U","S"))
design <- model.matrix(~Strain+Strain:Treatment)
colnames(design)
[1] "(Intercept)" "StrainMu" "StrainWT:TreatmentS" "StrainMu:TreatmentS"

The first term in the model formula is an effect for Strain. This introduces an intercept column to the design matrix, which estimates the average log-expression level for wild-type unstimulated cells, and a column for Strain which estimates the mutant vs wildtype di erence in the unstimulated state. The second term in the model formula represents the interaction between stimulation and strain. [...] It introduces a third and a fourth column to the design matrix which represent the effect of stimulation for wild-type and for mutant mice respectively [...].

fit <- lmFit(eset, design)
fit <- eBayes(fit)
topTable(fit, coef=3)
# will find those genes responding to stimulation in wild-type mice, and
topTable(fit, coef=4)
# will find those genes responding to stimulation in mutant mice

What I don't understand

If using coef is the same as looking at the difference between the 4th column of the design matrix and the intercept (i.e. the contrast between the fourth and first column), wouldn't we need to look at the contrast between the fourth and second column to get the genes responding to stimulation in mutant mice?

Of course I compared the results when using coef and when using contrasts. They differ but I do not understand why... Obviously it means that coef=4 does not mean "look at the difference between column 4 and the intercept", but what does it mean then?

I hope that the question is understandable. Many thanks in advance!


Solution

  • The design matrix is based on

    targets <- data.frame(
        Strain = factor(c("WT", "WT", "MU", "MU", "MU"), levels = c("WT", "MU")),
        Treatment = factor(c("U", "S", "U", "S", "S"), levels = c("U", "S")))
    design <- model.matrix(~ Strain + Strain:Treatment, data = targets)
    
    > targets
    ##   Strain Treatment
    ## 1     WT         U
    ## 2     WT         S
    ## 3     MU         U
    ## 4     MU         S
    ## 5     MU         S
    

    Each row of targets corresponds to an experimental sample. The design matrix looks like this:

    ##      (Intercept) StrainMU StrainWT:TreatmentS StrainMU:TreatmentS
    ##    1           1        0                   0                   0
    ##    2           1        0                   1                   0
    ##    3           1        1                   0                   0
    ##    4           1        1                   0                   1
    ##    5           1        1                   0                   1
    

    Again, each row corresponds to an experimental sample. The columns of design correspond to coefficients that are fitted by limma and you can read off what combination of coefficients gives the model-fitted value for a given experimental group by comparing the rows of design with those of targets.

    Looking at coef=4 effectively means that you're testing the null hypothesis that the fourth coefficient (that for StrainMu:TreatmentS) is zero - it isn't the same as comparing the value of the fourth coefficient against the value of the intercept coefficient.

    Think in terms of the fitted value for each of the experimental classes. For a given gene,

    So the difference between the stimulated and unstimulated group within the mutant strain is:

    (Intercept + StrainMU + StrainMU:TreatmentS) - (Intercept + StrainMU) = StrainMU:TreatmentS

    ... the coefficient corresponding to the 4th column in the design matrix

    Hopefully that was helpful


    ps, using coef=4 should give you the same result as using contrast = c(0, 0, 0, 1)