I am using univariate logistic mixed models with lme4
to investigate plasma parameters (M) on disease outcome with covariates:
formula <- Disease ~ Age + Sex + BMI + M + (1 | Family)
The sample sizes vary depending on availability of information about disease status but are in the range of 1000-3000 (number of groups is around 2/3 of the total sample size).
The model I am using is:
glmer(formula, data = df, family = binomial, control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))
For some combinations of Disease
and M
, I get convergence warnings. For some diseases I only get the warning for a few plasma parameters, whereas for others I get it on almost half.
These remain after following advice on the lme4 convergence warnings: troubleshooting guide (scaling the variables, increasing the tolerance, trying different solvers, restarting from a previous fit).
There are few instances of males with some of the diseases (table(df$Sex, df$Disease)
), however, I do not get perfect/quasi-perfect separation warnings.
Questions:
If my colleague uses the same dataset and model formula to run the analysis in Stats, they do not get any warnings. How could this be?
I would not like to switch from using R for practical purposes. Is there anything else I could do to solve these issues in R?
This information is probably repeated from somewhere, but:
lme4
are overly sensitive. The maintainers are well aware of this, and have considered dumping these particular tests altogether, but without an enormous amount of work it's hard to be sure that you wouldn't be missing important cases where the convergence warnings were justified. The convergence warnings are a hint that something about your analysis might be numerically unstable, not a definite sign of problems.nAGQ
to a value greater than 1, say 10 or 20)GLMMadaptive
and glmmTMB
(although the latter can't do GHQ, see previous point ...)allFit()
(see example below); the point is not to see if there is a particular optimizer that runs without convergence warnings, but to see if the results from all optimizers [or at least those that achieve similar log-likelihoods] are close enough for your application.This is an example from a previous SO question, updated.
df <- read.csv("http://www.math.mcmaster.ca/bolker/misc/SO23478792_dat.csv")
library(lme4)
library(broom.mixed)
library(dplyr)
library(ggplot2); theme_set(theme_bw())
df <- transform(df, SUR.ID = factor(SUR.ID),
replicate = factor(replicate),
Unit = factor(seq(nrow(df))))
m1 <- glmer(cbind(ValidDetections, FalseDetections) ~ tm:Area + tm:c.distance +
c.distance:Area + c.tm.depth:Area +
c.receiver.depth:Area + c.temp:Area +
c.wind:Area +
c.tm.depth + c.receiver.depth +
c.temp +c.wind + tm + c.distance + Area +
replicate +
(1|SUR.ID) + (1|Day) + (1|Unit) ,
data = df, family = binomial)
## rescale
numcols <- grep("^c\\.",names(df))
dfs <- df
dfs[,numcols] <- scale(dfs[,numcols])
m4 <- update(m1, data=dfs)
aa <- allFit(m4)
Using functions from broom.mixed
:
glance(aa) |> select(optimizer, NLL_rel) |> arrange(NLL_rel)
This indicates that all optimizers gave very similar overall fits, except for the two that failed (a difference of 0.016 on the log-likelihood scale is small):
optimizer NLL_rel
<chr> <dbl>
1 nlminbwrap 0
2 bobyqa 1.32e-10
3 optimx.L-BFGS-B 4.94e- 5
4 nloptwrap.NLOPT_LN_BOBYQA 3.10e- 3
5 Nelder_Mead 1.63e- 2
6 nmkbw Inf
7 nloptwrap.NLOPT_LN_NELDERMEAD Inf
Look at the fixed-effect estimates, ± 2 standard errors (assuming that you're interested in the fixed effects; you could also investigate the random effect [variance/covariance/std dev/correlation] parameter estimates)
tt <- tidy(aa, effects = "fixed") |> filter(term != "(Intercept)") |> select(optimizer, term, estimate, std.error)
ggplot(tt, aes(x = estimate, y = term, colour = optimizer)) +
geom_pointrange(aes(xmin=estimate-2*std.error, xmax = estimate+2*std.error),
position = position_dodge(width = 0.2)) +
facet_wrap(~term, scales = "free") +
theme(axis.text.y = element_blank())
Relative to the magnitude of the uncertainty, the differences among the optimizers are tiny. Or you could look at the absolute differences:
ggplot(tt, aes(x = estimate, y = term, colour = optimizer)) +
geom_point(position = position_dodge(width = 0.2)) +
facet_wrap(~term, scales = "free") +
theme(axis.text.y = element_blank())
You would have to decide for yourself whether these magnitudes of difference were important for your problem. (In this particular case I suspect there are also some complete-separation problems happening, possibly because a small subset of the data was used for the example.)