rmixed-models

Assigning group-specific dispersion parameters in a negative binomial mixed model


I have (overdispersed) count data that are collected at different facilities that I want to model using a negative binomial random effect model, with a random effect for facility. On further inspection of the data, overdispersion varies widely for each facility (ie, the distribution of the outcome in some facilities approximates Poisson, others are highly overdispersed, some are in between), ie:

enter image description here

I would like to specify the dispersion for each group within the model. I tried using the GLMMadaptive::mixed_model, as it has parameters n_phis and initial_values which the documentation specifies allows you to put in how many parameters and their initial values, so I estimated the dispersion parameter (szs) via MASS:fitdistr and tried to plug them in:

szs <- sapply(unique(dta$facility), \(i) MASS::fitdistr(dta$outcome[dta$facility == i], "negative binomial")[[1]][1])

GLMMadaptive::mixed_model(fixed = outcome ~ phase,
                          random = ~ 1|facility,
                          data = dta,
                          n_phis = length(unique(dta$facility)),
                          initial_values = szs, # also tried initial_values = c(phis = szs) given the function documentation
                          family = GLMMadaptive::negative.binomial())

However that was unsuccessful as it still models using a single dispersion parameter (removing the n_phis and initial_values commands returns the exact same model). I did not see any options for this type of thing in lme4::glmer.nb.

After some searching, I see that this can be done via a Bayesian approach in the brms package (Cross Validated questions here and here), but can this be done using packages that use other methods (ie, GLMMadaptive::mixed_model, lme4::glmer.nb, or another package)? To be clear, my question is about the coding implementation, not the underlying statistical concept.


Data

set.seed(123)
n <- 100
dta <- data.frame(facility = rep(LETTERS[1:4], each = n),
                  phase = rep(c("Intervention", "Control"), each = n / 2),
                  outcome = c(rnbinom(n, mu = 5, size = 0.02), 
                              rpois(n, lambda = 5),
                              rpois(n, lambda = 8), 
                              rnbinom(n, mu = 8, size = 0.5)))

Solution

  • You can definitely do this in glmmTMB, with dispformula (see code and results below).

    library(glmmTMB)
    m <- glmmTMB(outcome ~ phase + (1|facility),
                 dispformula  = ~ facility,
                 data = dta,
                 family = nbinom2)
    
    Formula:          outcome ~ phase + (1 | facility)
    Dispersion:               ~facility
    Data: dta
          AIC       BIC    logLik  df.resid 
    1684.8276 1712.7678 -835.4138       393 
    Random-effects (co)variances:
    
    Conditional model:
     Groups   Name        Std.Dev.
     facility (Intercept) 0.2193  
    
    Number of obs: 400 / Conditional model: facility, 4
    
    Fixed Effects:
    
    Conditional model:
          (Intercept)  phaseIntervention  
              1.89473           -0.01547  
    
    Dispersion model:
    (Intercept)    facilityB    facilityC    facilityD  
         -4.279        9.191       10.154        3.657