I am fairly new to R, and have recently started using it to analyse some microarray data. The overall aim of the analysis is to take DC2 and compare the WT vs KO groups in this population. But I have come across some problems with the limma processing. After processing the data using the oligo package, I have then tried to create a design matrix for analysis using limma. This is my workflow for the ExpressionSet of DC2:
pData(DC2)
index filename genotype cell_type
1 KO DC2 2 HP10.CEL KO DC2
2 KO DC2 3 HP11.CEL KO DC2
3 KO DC2 4 HP12.CEL KO DC2
1 WT DC2 10 HP7.CEL WT DC2
2 WT DC2 11 HP8.CEL WT DC2
3 WT DC2 12 HP9.CEL WT DC2
design <- model.matrix(~DC2$genotype)
design
(Intercept) DC2$genotypeWT
1 1 0
2 1 0
3 1 0
4 1 1
5 1 1
6 1 1
fit <- lmFit(DC2, design)
fit <- eBayes(fit)
toptable(fit)
This feeds out a list of genes as follows:
logFC t P.Value adj.P.Val B
17551163 14.09722 208.2627 2.990326e-13 2.700912e-10 17.14467
17511316 13.91167 205.0811 3.292503e-13 2.700912e-10 17.12716
17551167 13.92093 204.5801 3.343243e-13 2.700912e-10 17.12434
17375373 13.76320 202.1271 3.605170e-13 2.700912e-10 17.11025
17550685 13.74022 201.5428 3.671032e-13 2.700912e-10 17.10682
However, when I go to check this manually (just taking the first feature) using this code:
toptable(fit, n=1)
genename <- rownames(toptable(fit, n=1))
typeMean <- tapply(exprs(DC2)[genename,], DC2$genotype, mean)
typeMean["KO"] - typeMean["WT"]
The output for the same feature "17551163" is different
KO
0.04538317
I have tried to search around for an answer, but with no luck. I'm assuming that this may be something to do with the matrix design? Any help would be greatly appreciated.
Thanks
An answer, for those who skip reading the discussion in comments below the question.
After performing the estimation with lmFit
and eBayes
we can question the top differentiating genes between all the contrasts that we provided in the model.matrix
step.
Here, the author created the design as follows: design <- model.matrix(~DC2$genotype)
. Keeping in mind that the (Intercept)
is the first coefficient if we want need to explicitly say that we want the contrast related to the DC2$genotype
, so the call should be:
toptable(fit, coef = 2)
Naturally, if the design contains more contrasts they are assigned consecutive natural numbers.
REMARK
If we want to remove the intercept from the design design <- model.matrix(~ -1 + DC2$genotype)
; the first coefficient is now DC2$genotype
.