rplotimputationmarginal-effects

Plot calculated average marginal effects from multiply imputed data in R


I am trying to plot the average marginal effects (AME) of logit regressions in R after I have multiply imputed data with m = 100.

I am aware of how to plot AME calculated in single datasets, such as using the package 'sjPlot::plot_model()' or 'marginaleffects::plot_slopes()'. However, I cannot figure out how I can plot AME that are pooled over the m = 100 imputed data sets.

Below is a reproducible example of what my results are so far.

library(mice)
library(marginaleffects)


## creates dummy for further analysis
tmpdf<-iris
virginica<-ifelse(tmpdf$Species=="virginica", 1,0)
df2<-data.frame(Sepal.Length = tmpdf$Sepal.Length,
                Sepal.Width = tmpdf$Sepal.Width,
                virginica = virginica)

## random generate data
missdf<-mice::ampute(df2)$amp

## use mice package to impute m = 100 data
impdf<-mice::mice(missdf,m=100,maxit=1,seed=1234)

# creates dummy on the outcome variable (Species)


## check pattern of imputed data
# densityplot(impdf)

## estimate logistic model using and then return pooled estimates 
## using Rubin's rule
model<-with(impdf, 
            glm(virginica ~ Sepal.Length + Sepal.Width, 
                family=binomial("logit")))

## calcuates marginal effects and returns pooled results
marginal<-marginaleffects::avg_slopes(model)
marginal

What I had in mind was rather something else Below is a plot that I creat using one dataset

#full data (no missing imputation)
tmpdf<-iris
virginica<-ifelse(tmpdf$Species=="virginica", 1,0)
df2<-data.frame(Sepal.Length = tmpdf$Sepal.Length,
                Sepal.Width = tmpdf$Sepal.Width,
                virginica = virginica)

model2<- glm(virginica ~ Sepal.Length + Sepal.Width, 
             family=binomial("logit"), data=df2)

# plotting slope of virginia on Sepal.Length at values of Sepal.Width
plot_slopes(model2,variables = "Sepal.Length",condition="Sepal.Width")
``` 

Solution

  • The output of avg_slopes() is just a standard (but “decorated”) data frame, so you can use any of R’s plotting functions, like ggplot2.

    library(mice)
    library(ggplot2)
    library(marginaleffects)
    
    tmpdf <- iris
    virginica <- ifelse(tmpdf$Species == "virginica", 1, 0)
    df2 <- data.frame(Sepal.Length = tmpdf$Sepal.Length, Sepal.Width = tmpdf$Sepal.Width, virginica = virginica)
    missdf <- mice::ampute(df2)$amp
    impdf <- mice::mice(missdf, m = 100, maxit = 1, seed = 1234)
    model <- with(impdf, glm(virginica ~ Sepal.Length + Sepal.Width, family = binomial("logit")))
    marginal <- marginaleffects::avg_slopes(model)
    
    ggplot(marginal, aes(y = term, x = estimate, xmin = conf.low, xmax = conf.high)) +
        geom_pointrange() +
        theme_minimal()
    

    Edit for new question in comments.

    This is a bit tricky to do because using datagrid() inside the call itself does not work for multiple imputation as it would for other complete data models. So we proceed in two steps: First, create a drawing grid, then plot.

    nd <- datagrid(
        newdata = tmpdf,
        Sepal.Width = seq(from = min(tmpdf$Sepal.Width), to = max(tmpdf$Sepal.Width), length.out = 100)
    )
    datplot <- slopes(
        model,
        variables = "Sepal.Length",
        newdata = nd)
    
    ggplot(datplot, aes(x = Sepal.Width, y = estimate, ymin = conf.low, ymax = conf.high)) +
        geom_ribbon(alpha = .2) +
        geom_line()