rregressionstatalme4mixed-models

Logistic mixed model convergence warnings in R but not Stata


I am using univariate logistic mixed models with lme4 to investigate plasma parameters (M) on disease outcome with covariates:

formula <- Disease ~ Age + Sex + BMI + M + (1 | Family)

The sample sizes vary depending on availability of information about disease status but are in the range of 1000-3000 (number of groups is around 2/3 of the total sample size).

The model I am using is:

glmer(formula, data = df, family = binomial, control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))

For some combinations of Disease and M, I get convergence warnings. For some diseases I only get the warning for a few plasma parameters, whereas for others I get it on almost half.

These remain after following advice on the lme4 convergence warnings: troubleshooting guide (scaling the variables, increasing the tolerance, trying different solvers, restarting from a previous fit).

There are few instances of males with some of the diseases (table(df$Sex, df$Disease)), however, I do not get perfect/quasi-perfect separation warnings.

Questions:

  1. If my colleague uses the same dataset and model formula to run the analysis in Stats, they do not get any warnings. How could this be?

  2. I would not like to switch from using R for practical purposes. Is there anything else I could do to solve these issues in R?


Solution

  • This information is probably repeated from somewhere, but:

    This is an example from a previous SO question, updated.

    df <- read.csv("http://www.math.mcmaster.ca/bolker/misc/SO23478792_dat.csv")
    
    library(lme4)
    library(broom.mixed)
    library(dplyr)
    library(ggplot2); theme_set(theme_bw())
    
    df <- transform(df, SUR.ID = factor(SUR.ID),
                    replicate = factor(replicate),
                    Unit = factor(seq(nrow(df))))
    m1 <- glmer(cbind(ValidDetections, FalseDetections) ~ tm:Area + tm:c.distance +
                c.distance:Area + c.tm.depth:Area +
                c.receiver.depth:Area + c.temp:Area +
                c.wind:Area +
                c.tm.depth + c.receiver.depth +
                c.temp +c.wind + tm + c.distance + Area +
                replicate +
                (1|SUR.ID) + (1|Day) + (1|Unit) ,
                data = df, family = binomial)
    
    
    ## rescale
    numcols <- grep("^c\\.",names(df))
    dfs <- df
    dfs[,numcols] <- scale(dfs[,numcols])
    m4 <- update(m1, data=dfs)
    
    aa <- allFit(m4)
    

    Using functions from broom.mixed:

    glance(aa) |> select(optimizer, NLL_rel) |> arrange(NLL_rel)
    

    This indicates that all optimizers gave very similar overall fits, except for the two that failed (a difference of 0.016 on the log-likelihood scale is small):

      optimizer                        NLL_rel
      <chr>                              <dbl>
    1 nlminbwrap                      0       
    2 bobyqa                          1.32e-10
    3 optimx.L-BFGS-B                 4.94e- 5
    4 nloptwrap.NLOPT_LN_BOBYQA       3.10e- 3
    5 Nelder_Mead                     1.63e- 2
    6 nmkbw                         Inf       
    7 nloptwrap.NLOPT_LN_NELDERMEAD Inf       
    

    Look at the fixed-effect estimates, ± 2 standard errors (assuming that you're interested in the fixed effects; you could also investigate the random effect [variance/covariance/std dev/correlation] parameter estimates)

    tt <- tidy(aa, effects = "fixed") |> filter(term != "(Intercept)") |> select(optimizer, term, estimate, std.error)
    ggplot(tt, aes(x = estimate, y = term, colour = optimizer)) +
        geom_pointrange(aes(xmin=estimate-2*std.error, xmax = estimate+2*std.error),
                        position = position_dodge(width = 0.2)) +
        facet_wrap(~term, scales = "free") +
        theme(axis.text.y = element_blank())
    

    enter image description here

    Relative to the magnitude of the uncertainty, the differences among the optimizers are tiny. Or you could look at the absolute differences:

    ggplot(tt, aes(x = estimate, y = term, colour = optimizer)) +
        geom_point(position = position_dodge(width = 0.2)) +
        facet_wrap(~term, scales = "free") +
        theme(axis.text.y = element_blank())
    

    You would have to decide for yourself whether these magnitudes of difference were important for your problem. (In this particular case I suspect there are also some complete-separation problems happening, possibly because a small subset of the data was used for the example.)

    enter image description here