I have (overdispersed) count data that are collected at different facilities that I want to model using a negative binomial random effect model, with a random effect for facility. On further inspection of the data, overdispersion varies widely for each facility (ie, the distribution of the outcome in some facilities approximates Poisson, others are highly overdispersed, some are in between), ie:
I would like to specify the dispersion for each group within the model. I tried using the GLMMadaptive::mixed_model
, as it has parameters n_phis
and initial_values
which the documentation specifies allows you to put in how many parameters and their initial values, so I estimated the dispersion parameter (szs
) via MASS:fitdistr
and tried to plug them in:
szs <- sapply(unique(dta$facility), \(i) MASS::fitdistr(dta$outcome[dta$facility == i], "negative binomial")[[1]][1])
GLMMadaptive::mixed_model(fixed = outcome ~ phase,
random = ~ 1|facility,
data = dta,
n_phis = length(unique(dta$facility)),
initial_values = szs, # also tried initial_values = c(phis = szs) given the function documentation
family = GLMMadaptive::negative.binomial())
However that was unsuccessful as it still models using a single dispersion parameter (removing the n_phis
and initial_values
commands returns the exact same model). I did not see any options for this type of thing in lme4::glmer.nb
.
After some searching, I see that this can be done via a Bayesian approach in the brms
package (Cross Validated questions here and here), but can this be done using packages that use other methods (ie, GLMMadaptive::mixed_model
, lme4::glmer.nb
, or another package)? To be clear, my question is about the coding implementation, not the underlying statistical concept.
Data
set.seed(123)
n <- 100
dta <- data.frame(facility = rep(LETTERS[1:4], each = n),
phase = rep(c("Intervention", "Control"), each = n / 2),
outcome = c(rnbinom(n, mu = 5, size = 0.02),
rpois(n, lambda = 5),
rpois(n, lambda = 8),
rnbinom(n, mu = 8, size = 0.5)))
You can definitely do this in glmmTMB
, with dispformula
(see code and results below).
n_phis
parameter in mixed_model()
doesn't do what you think it does: this describes the number of parameters required for dispersion and shape parameters for a particular family (e.g. 0 for Poisson, 1 for neg binom, 2 for Tweedie)glmmTMB
doesn't allow random effects in the dispersion model, e.g. dispformula = ~ 1|facility)
, even if that might sometimes make sense ...)VGAM::negbinomial()
...library(glmmTMB)
m <- glmmTMB(outcome ~ phase + (1|facility),
dispformula = ~ facility,
data = dta,
family = nbinom2)
Formula: outcome ~ phase + (1 | facility)
Dispersion: ~facility
Data: dta
AIC BIC logLik df.resid
1684.8276 1712.7678 -835.4138 393
Random-effects (co)variances:
Conditional model:
Groups Name Std.Dev.
facility (Intercept) 0.2193
Number of obs: 400 / Conditional model: facility, 4
Fixed Effects:
Conditional model:
(Intercept) phaseIntervention
1.89473 -0.01547
Dispersion model:
(Intercept) facilityB facilityC facilityD
-4.279 9.191 10.154 3.657