rlme4mixed-modelspoissonr-mice

Quasi-Poisson mixed-effect model on overdispersed count data from multiple imputed datasets in R


I'm dealing with problems of three parts that I can solve separately, but now I need to solve them together:

  1. extremely skewed, over-dispersed dependent count variable (the number of incidents while doing something),
  2. necessity to include random effects,
  3. lots of missing values -> multiple imputation -> 10 imputed datasets.

To solve the first two parts, I chose a quasi-Poisson mixed-effect model. Since stats::glm isn't able to include random effects properly (or I haven't figured it out) and lme4::glmer doesn't support the quasi-families, I worked with glmer(family = "poisson") and then adjusted the std. errors, z statistics and p-values as recommended here and discussed here. So I basically turn Poisson mixed-effect regression into quasi-Poisson mixed-effect regression "by hand".

This is all good with one dataset. But I have 10 of them.

I roughly understand the procedure of analyzing multiple imputed datasets – 1. imputation, 2. model fitting, 3. pooling results (I'm using mice library). I can do these steps for a Poisson regression but not for a quasi-Poisson mixed-effect regression. Is it even possible to A) pool across models based on a quasi-distribution, B) get residuals from a pooled object (class "mipo")? I'm not sure. Also I'm not sure how to understand the pooled results for mixed models (I miss random effects in the pooled output; although I've found this page which I'm currently trying to go through).

Can I get some help, please? Any suggestions on how to complete the analysis (addressing all three issues above) would be highly appreciated.

Example of data is here (repre_d_v1 and repre_all_data are stored in there) and below is a crucial part of my code.

library(dplyr); library(tidyr); library(tidyverse); library(lme4); library(broom.mixed); library(mice)
# please download "qP_data.RData" from the last link above and load them

## ===========================================================================================
# quasi-Poisson mixed model from single data set (this is OK)
# first run Poisson regression on df "repre_d_v1", then turn it into quasi-Poisson
modelSingle = glmer(Y ~ Gender + Age + Xi + Age:Xi + (1|Country) + (1|Participant_ID),
                    family = "poisson",
                    data = repre_d_v1)
# I know there are some warnings but it's because I share only a modified subset of data with you (:
printCoefmat(coef(summary(modelSingle))) # unadjusted coefficient table

# define quasi-likelihood adjustment function
quasi_table = function(model, ctab = coef(summary(model))) {
  phi = sum(residuals(model, type = "pearson")^2) / df.residual(model)
  qctab = within(as.data.frame(ctab),
                 {`Std. Error` = `Std. Error`*sqrt(phi)
                 `z value` = Estimate/`Std. Error`
                 `Pr(>|z|)` = 2*pnorm(abs(`z value`), lower.tail = FALSE)
                 })
  return(qctab)
}
printCoefmat(quasi_table(modelSingle)) # done, makes sense

## ===========================================================================================
# now let's work with more than one data set
# object "repre_all_data" of class "mids" contains 10 imputed data sets
# fit model using with() function, then pool()
modelMultiple = with(data = repre_all_data,
                     expr = glmer(Y ~ Gender + Age + Xi + Age:Xi + (1|Country) + (1|Participant_ID),
                                  family = "poisson"))
summary(pool(modelMultiple)) # class "mipo" ("mipo.summary")
# this has quite similar structure as coef(summary(someGLM))
# but I don't see where are the random effects?
# and more importantly, I wanted a quasi-Poisson model, not just Poisson model...
# ...but here it is not possible to use quasi_table function (defined earlier)...
# ...and that's because I can't compute "phi"

Solution

  • This seems reasonable, with the caveat that I'm only thinking about the computation, not whether this makes statistical sense. What I'm doing here is computing the dispersion for each of the individual fits and then applying it to the summary table, using a variant of the machinery that you posted above.

    ## compute dispersion values
    phivec <- vapply(modelMultiple$analyses,
                     function(model) sum(residuals(model, type = "pearson")^2) / df.residual(model),
                     FUN.VALUE = numeric(1))
    
    phi_mean <- mean(phivec)
    ss <- summary(pool(modelMultiple)) # class "mipo" ("mipo.summary")
    ## adjust
    qctab <- within(as.data.frame(ss),
    {   std.error <- std.error*sqrt(phi_mean)
        statistic <- estimate/std.error
        p.value <- 2*pnorm(abs(statistic), lower.tail = FALSE)
    })
    

    The results look weird (dispersion < 1, all model results identical), but I'm assuming that's because you gave us a weird subset as a reproducible example ...