rr-mice

Using a mids object with the marginaleffects package - newdata argument not working


I am trying to use marginaleffects::avg_comparisons with an imputed dataset (imputed via the mice package) but get an error if I try to use the newdata argument.

This works:

with(data_imputed, avg_comparisons(my_model, variables = list(Allocation = c("Control", "Intervention")), type = "link"))

But the newdata argument doesn't.

with(data_imputed, avg_comparisons(my_model, variables = list(Allocation = c("Control", "Intervention")), type = "link", newdata=datagrid(age=50)))

The error message is:

Error in evalup(calltmp) : 
  Unable to compute predicted values with this model. This error can arise when
  `insight::get_data()` is unable to extract the dataset from the model object, or when the
  data frame was modified since fitting the model. You can try to supply a different dataset
  to the `newdata` argument.
  
  Error in model.frame.default(delete.response(terms(object, fixed.only = TRUE, : variable
  lengths differ (found for 'scale(age)')
   with(data_imputed, glmer(reponse_variable ~ scale(age) + Allocation +
   scale(age)*Allocation + (1|RandomEffect), family="binomial",
   control=glmerControl(optimizer="bobyqa",optCtrl = list(maxfun =
   2e5))))

Solution

  • You could try the modelbased-package, which internally uses the marginaleffects package as backend (thus, supporting the same range of models - it just has a slightly different user interface) and which has two function to deal with multiple imputation: pool_predictions() and pool_contrasts().

    The basic idea is that you run estimate_means(), which calculates predictions/marginal means, on each imputed data set and then pool those results.

    # example for multiple imputed datasets
    data("nhanes2", package = "mice")
    imp <- mice::mice(nhanes2, printFlag = FALSE)
    predictions <- lapply(1:5, function(i) {
      m <- lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i))
      estimate_means(m, "age")
    })
    pool_predictions(predictions)
    #> Estimated Marginal Means
    #> 
    #> age   |  Mean |   SE |        95% CI |  t(1)
    #> --------------------------------------------
    #> 20-39 | 30.54 | 1.67 | [9.38, 51.71] | 29.23
    #> 40-59 | 24.83 | 1.55 | [5.16, 44.50] | 22.84
    #> 60-99 | 23.15 | 1.71 | [1.48, 44.82] | 17.44
    #> 
    #> Variable predicted: bmi
    #> Predictors modulated: age
    #> Predictors averaged: hyp, chl (1.9e+02)
    #> 
    

    In your particular case, you can either use the newdata argument as well (which is passed to the marginaleffects functions), or you define your "data grid" directly in those arguments, where you specify your focal terms. There are a lot of examples shown here.

    All (focal) variables specified in by (or contrast for estimate_contrasts()) and their "representative value" definitions are passed to insight::get_datagrid(). This is where you find one difference in the user interface between modelbased and marginaleffects. If you write by = "age=50", this internally creates a corresponding data grid, thus being a "convenient" shortcut.