I have a long format dataset for data on about 70K individuals. The data is tracking a continuous follow-up measure at 3, 6, and 12 months past baseline (a +/- 1 month buffer was given for each time point). Members only had to have 1 follow up measure at any time point to be included in the study. There is complete data at baseline and 94% complete data at month 3, however, it has quite a lot of missingness at the later time points (60% at month 6, 87% at month 12).
ID | time_point | continuous outcome |
---|---|---|
1 | 0 | 7.5 |
1 | 3 | 7.2 |
1 | 6 | NA |
1 | 12 | 7.0 |
I am using the lmer
command in R's lme4
package to attempt to run this model with time as a factor variable in the fixed effects combined with a random slope (time as a continuous variable) and intercept in the random effects, treating each member as an individual cluster.
Code:
m.unstructured <- lmer(outcome ~ time_factor + (1 + time | id),
data = df.long
)
I am expecting to get a model summary containing values for the changes from the baseline at each time point. However, the model fails to converge with the following error:
Warning message:
In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00317116 (tol = 0.002, component 1)
I can only get the model to run when I add an optimizer like Nelder-Mead or bobyqa, eg, control = lmerControl(optimizer ="Nelder_Mead")
. I experienced similar issues when attempting to run this model in the nlme
package with the lme
command.
My questions are, what is that argument doing and why won't my model run without it? Is it due to the large sample size and extensive missingness? Can I assume this approach using an optimizer produces valid results? Eventually I'd like to compare covariance structures and add in additional fixed effects such as sex and age, but until I can understand how to run my model, I am stuck. Any guidance you could provide is appreciated!
There is a lot of information about this in the lme4
documentation and auxiliary info:
More specifically, this page illustrates that the convergence-checking machinery starts to get unreliable around 10,000 observations (you have around 120K observations: an 'observation' is a row in the data frame, i.e. a subject:time_point combination).
What you are calling an error is not technically an error, it's a warning; this is an important distinction (see here for more info. If you really had an error, you wouldn't be able to retrieve a result; since it's a warning, you can.
My suggestion:
allFit()
to try your model with all of the available optimizerstime_factor
or the random effects variances, and to whatever tolerance is important to your application)
lmerControl(calc.derivs = FALSE)
to suppress the (error-prone) derivative calculation, which will save you from further warnings (!) and save time ...