I'm currently estimating a hierarchical linear model (HLM) using lme4. My entire dataset has 367 observations. lme4 estimated my model using 341 observations - I assumed some were dropped due to missing data. However, when I sum the complete cases on the model's variables, I end up with 337 observations. This is making it difficult to test for assumptions when the model is a different length than the dataset.
There is a discrepancy between complete cases and the observations 'used' by lme4.
As described, I tried to remove missing data from my main dataset assuming lme4 drops cases listwise. I've tried checking each variable for its missingness to see if lme4 was just testing a certain variable, but none match up with lme4's output estimate of 341.
If needed, I can provide the anonymized dataset - but hoping there's something easy I'm not aware of!
The most obvious reason would be that lme4
(and internally model.frame
) assesses completeness only on the basis of the variables that are actually used in the model. Do you have NA
values in variables that are not included in the model formula?
(For what it's worth, this default also means that it's a good idea to filter the full data set for complete cases first if you are going to fit a series of models with different subsets of predictors and want the models to be comparable ...)
Example:
library(lme4)
ss <- sleepstudy
ss$Days[c(1, 4, 10)] <- NA
ss$extra <- 1
ss$extra[c(4, 20, 25)] <- NA
nrow(ss) ## 180
sum(complete.cases(ss)) ## 175; drops obs with missing `extra`
m <- lmer(Reaction ~ Days + (1|Subject), ss, na.action = na.omit)
mf <- model.frame(m)
nrow(mf) ## 177; ignores variables not in the model
Which rows were deleted? Two ways to check:
attr(mf, "na.action")
1 4 10
1 4 10
attr(,"class")
[1] "omit"
setdiff(rownames(ss), rownames(mf))
[1] "1" "4" "10"
The equivalent to the internal code is
nrow(model.frame(Reaction ~ Days + Subject, ss)) ## 177
(the formula gets processed so that the random effects grouping variables are also included in the formula for the purpose of excluding incomplete cases)