rplotpredictlme4mumin

Plotting results of logistic regression with binomial data from mixed effects model (lme4) with model averaging (MuMIn)


I'm trying to display the results of a logistic regression. My model was fit using glmer() from the lme4 package, I then used MuMIn for model averaging.

Simplified version of my model using the mtcars dataset:

glmer(vs ~ wt +  am + (1|carb), database, family = binomial, na.action = "na.fail")

My desired output is two plots that show the predicted probability that vs=1, one for wt, which is continuous, one for am, which is binomial.

I got this much working after comments from @KamilBartoń:

database <- mtcars

# Scale data
database$wt <- scale(mtcars$wt)
database$am <- scale(mtcars$am)

# Make global model
model.1 <- glmer(vs ~ wt + am + (1|carb), database, family = binomial, na.action = "na.fail")

# Model selection
model.1.set <- dredge(model.1, rank = "AICc")

# Get models with <10 delta AICc
top.models.1 <- get.models(model.1.set,subset = delta<10)

# Model averaging
model.1.avg <- model.avg(top.models.1)

# make dataframe with all values set to their mean
xweight <- as.data.frame(lapply(lapply(database[, -1], mean), rep, 100))

# add new sequence of wt to xweight along range of data
xweight$wt <- (wt = seq(min(database$wt), max(database$wt), length = 100))

# predict new values
yweight <- predict(model.1.avg, newdata = xweight, type="response", re.form=NA)

# Make plot 
plot(database$wt, database$vs, pch = 20, xlab = "WEIGHT (g)", ylab = "VS")

# Add predicted line
lines(xweight$wt, yweight)

Produces:

enter image description here

The remaining issue is that the data are scaled and centred around 0, meaning interpretation of the graph is impossible. I'm able to unscale the data using an answer from @BenBolker to this question but this does not display correctly:

## Ben Bolker's unscale function:
## scale variable x using center/scale attributes of variable y
scfun <- function(x,y) {
  scale(x,
        center=attr(y,"scaled:center"),
        scale=attr(y,"scaled:scale"))
        }

## scale prediction frame with scale values of original data -- for all variables
xweight_sc <- transform(xweight,
                        wt = scfun(wt, database$wt),
                        am = scfun(am, database$am))

# predict new values
yweight <- predict(model.1.avg, newdata = xweight_sc, type="response", re.form=NA)

# Make plot 
plot(mtcars$wt, mtcars$vs, pch = 20, xlab = "WEIGHT (g)", ylab = "VS")

# Add predicted line
lines(xweight$wt, yweight)

Produces:

enter image description here

I've tried this a few different ways but can't work out what the problem is. What have I done wrong?

Also, another remaining issue: How do I make a binomial plot for am?


Solution

  • setup

    library(lme4)
    library(MuMIn)
    database <- mtcars
    database$wt <- scale(mtcars$wt)
    database$am <- scale(mtcars$am)
    model.1 <- glmer(vs ~ wt + am + (1|carb), database, family = binomial, na.action = "na.fail")
    model.1.set <- dredge(model.1, rank = "AICc")
    top.models.1 <- get.models(model.1.set,subset = delta<10)
    model.1.avg <- model.avg(top.models.1)
    

    Answer

    The problem at hand seems to be creating a graph of the average effect similar to the effects package (or the ggeffects package). Thomas got pretty close, but a small misunderstanding of Ben Bolkers answer, has led to inverting the scaling process, which in this case led to double scaling of parameters. This can be seen illustrated below by snippeting out the code above.

    database$wt <- scale(mtcars$wt)
    database$am <- scale(mtcars$am)
    
    # More code
    
    xweight <- as.data.frame(lapply(lapply(database[, -1], mean), rep, 100))
    xweight$wt <- (wt = seq(min(database$wt), max(database$wt), length = 100))
    
    # more code 
    
    scfun <- function(x,y) {
      scale(x,
            center=attr(y,"scaled:center"),
            scale=attr(y,"scaled:scale"))
            }
    xweight_sc <- transform(xweight,
                            wt = scfun(wt, database$wt),
                            am = scfun(am, database$am))
    

    From this we see that xweight is actually already scaled, and thus the second time scaling is used, we obtain

    sc <- attr(database$wt, 'scaled:scale')
    ce <- attr(database$wt, 'scaled:center')
    xweight_sc$wt <- scale(scale(seq(min(mtcars$wt), max(mtcars$wt), ce, sc), ce, sc)
    

    What Ben Bolker is talking about in his answer however, is the situation where a model uses scaled predictors while the data used for prediction was not. In this case the data is scaled correctly, but one wishes to interpret it for the original scale. We simply have to invert the process. For this one could use 2 methods.

    Method 1: changing breaks in ggplot

    note: One could use custom labels in xlab in base R.

    One method for changing the axis is to.. change the axis. This allows one to keep the data and only rescale the labels.

    # Extract scales
    sc <- attr(database$wt, 'scaled:scale')
    ce <- attr(database$wt, 'scaled:center')
    # Create plotting and predict data
    n <- 100
    pred_data <- aggregate(. ~ 1, data = mtcars, FUN = mean)[rep(1, 100), ]
    pred_data$wt <- seq(min(database$wt), max(database$wt), length = n)
    pred_data$vs <- predict(model.1.avg, newdata = pred_data, type = 'response', re.form = NA)  
    # Create breaks
    library(scales) #for pretty_breaks and label_number
    breaks <- pretty_breaks()(pred_data$wt, 4) #4 is abritrary
    # Unscale the breaks to be used as labels
    labels <- label_number()(breaks * sc + ce) #See method 2 for explanation
    # Finaly we plot the result
    library(ggplot2)
    ggplot(data = pred_data, aes(x = wt, y = vs)) + 
      geom_line() + 
      geom_point(data = database) + 
      scale_x_continuous(breaks = breaks, labels = labels) #to change labels.
    

    which is the desired result. Note that there is no confidence bands, that is due to the lack of a closed-form for the confidence intervals for the original model, and it seems likely that the best method to get any estimate at all, is to use bootstrapping.

    method 2: Unscaling

    In unscaling we simply invert the process of scale. As scale(x)= (x - mean(x))/sd(x) we simply have to isolate x: x = scale(x) * sd(x) + mean(x), and this is the process to be done, but still remember to use the scaled data during prediction:

    # unscale the variables 
    pred_data$wt <- pred_data$wt * sc + ce
    database$wt <- database$wt * sc + ce
    
    # Finally plot the result
    ggplot(data = pred_data, aes(x = wt, y = vs)) + 
             geom_line() + 
             geom_point(data = database)
    

    which is the desired result.