How can I use anova() for significance testing of lm and lmer objects while respecting variable contrast codings?

I am confused about the relationship between the significance test result shown in the output of summary() called on a lm or lmer object, and the result shown in the output of anova() called on that same object. Specifically, I don't understand (a) why, for factors with df=1 (for which it should be possible to compare results), the results don't always agree; and (b) why summary() respects the contrast weights assigned to each factor but anova() does not.

Here is an example for lm:

data(iris)

## Apply treatment coding to Species, and fit model
contrasts(iris$Species) <- contr.treatment(length(levels(iris$Species)))
iris.lm.treatment <- lm(Sepal.Length ~ Petal.Length * Species, data=iris)

# check Petal.Length p-value in lm() output
coef(summary(iris.lm.treatment))["Petal.Length","Pr(>|t|)"]
[1] 0.05199902 

# check Petal.Length p-value in anova() output
as.data.frame(anova(iris.lm.treatment))["Petal.Length","Pr(>F)"]
[1] 1.244558e-56


## Apply sum coding to Species, and fit model
contrasts(iris$Species) <- contr.sum(length(levels(iris$Species)))/2
iris.lm.sum <- lm(Sepal.Length ~ Petal.Length * Species, data=iris)

# check Petal.Length p-value in lm() output
coef(summary(iris.lm.sum))["Petal.Length","Pr(>|t|)"]
[1] 2.091453e-12 

# check Petal.Length p-value in anova() output
as.data.frame(anova(iris.lm.sum))["Petal.Length","Pr(>F)"]
[1] 1.244558e-56

The significance test of Petal.Length in the fitted lm changes when the contrast coding of Species changes – that makes sense because the model evaluates each factor with orthogonal factors held constant at zero. However, the significance test of Petal.Length in the anova result is the same either way, and does not match the result from either lm.

The behavior with lmer (with significance testing accomplished via the lmerTest package) is confusing in a related way:

library(lmerTest)
data(ham)

## Apply treatment coding to variables, and fit model
contrasts(ham$Information) <- contr.treatment(length(levels(ham$Information)))
contrasts(ham$Product    ) <- contr.treatment(length(levels(ham$Product    )))
ham.lmer.treatment <- lmer(Informed.liking ~ Information * Product + (1 | Consumer) + (1 | Consumer:Product), data=ham)

# check Information p-value in lmer() output
coef(summary(ham.lmer.treatment))["Information2","Pr(>|t|)"]
[1] 0.4295516

# check Information p-value in anova() output
as.data.frame(anova(ham.lmer.treatment))["Information","Pr(>F)"]
[1] 0.04885354


## Apply sum coding to variables, and fit model
contrasts(ham$Information) <- contr.sum(length(levels(ham$Information)))/2
contrasts(ham$Product    ) <- contr.sum(length(levels(ham$Product    )))/2
ham.lmer.sum <- lmer(Informed.liking ~ Information * Product + (1 | Consumer) + (1 | Consumer:Product), data=ham)

# check Information p-value in lmer() output
coef(summary(ham.lmer.sum))["Information1","Pr(>|t|)"]
[1] 0.04885354

# check Information p-value in anova() output
as.data.frame(anova(ham.lmer.sum))["Information","Pr(>F)"]
[1] 0.04885354

Here, it is still the case that variable coding appears to affect the results shown in the output of summary() but not the results shown in the output of anova(). However, both anova() results match the lmer() result obtained when sum coding is used.

It seems to me that in both cases, anova() is ignoring the variable codings used and using some other variable coding – which, in the lmer case, appears to be sum coding – to evaluate significance. I would like to know how to perform a statistical test that uses the assigned variable codings. For lmer, at least, I can accomplish this with contestMD(); e.g.,

# test Information significance while respecting contrast weights
contestMD(ham.lmer.treatment, as.numeric(names(fixef(ham.lmer.treatment))=="Information2"))[,"Pr(>F)"]
[1] 0.4295516   # matches output from summary(ham.lmer.treatment)

However, I can't figure out how to do the equivalent test for lm (presumably using glht, but I can't figure out the right function call). So, my questions are:

Conceptually, why does anova() not respect the assigned variable codings? (Presumably this is all intended behavior, but the reason why is opaque to me.)
Practically, what variable coding is being used by anova() when called on an lm object?
How can I perform the kind of significance testing I want with an lm object? (I used examples with df=1 above because they can be compared between model output and anova() output, but of course what I really want to do is test for effects that have df>1.)

Solution

I still haven't answered my first two questions, but in answer to the third one, it seems I can get the results I want by creating subset models – each with a factor removed – and comparing each one to the full model using anova(). For the example given above (iris.lm.treatment), I could do the following. (In my example, I've gone to the trouble of first re-fitting the model with explicitly numeric predictors, as I otherwise encounter difficulties when using anova() to compare models.)

# create numeric columns with the same contrast codings as the nominal factor
Species.numeric <- as.data.frame(model.matrix(~ Species, data=iris))

# drop Intercept column
Species.numeric <- Species.numeric[,2:ncol(Species.numeric)]

# rename columns as Species.num1 & Species.num2 and append to iris
names(Species.numeric) <- paste0("Species.num", 1:ncol(Species.numeric))
iris <- cbind(iris, Species.numeric)

# re-fit lm with all numeric predictors
iris.lm.treatment.num <- lm(Sepal.Length ~ Petal.Length * (Species.num1 + Species.num2), data=iris)

# for each factor, create a subset model that has that factor removed
iris.lm.treatment.num.noPetalLength <- update(iris.lm.treatment.num, . ~ . - Petal.Length                              )
iris.lm.treatment.num.noSpecies     <- update(iris.lm.treatment.num, . ~ . - (Species.num1 + Species.num2)             )
iris.lm.treatment.num.noInteraction <- update(iris.lm.treatment.num, . ~ . - Petal.Length:(Species.num1 + Species.num2))

# use anova() to compare each subset model to the full model
anova(iris.lm.treatment.num.noPetalLength, iris.lm.treatment.num)   # p =  .052
anova(iris.lm.treatment.num.noSpecies,     iris.lm.treatment.num)   # p = 7.611e-06
anova(iris.lm.treatment.num.noInteraction, iris.lm.treatment.num)   # p =  .1895

The main effect of petal length yields a p-value of .052, which matches the result in iris.lm.treatment.