rggplot2multinomial

How do I create a predicted probabilities graph with confidence intervals for a multinomial logistic regression with an interaction term in R?


Is there an easier way to create a predicted probabilities graph with confidence intervals for a multinomial logistic regression with an interaction term? I have the following code below, which seems like a very long way to do something that I could just do with one line of code for other regression types. I'm also wondering if creating three graphs for each of my y levels if the best way to represent the results.

Also, I was wondering if there's a way to make the geom_ribbon to look smoother.

# data frame
df <- data.frame(
  rating = c("1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse","1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse"),
  count = c(2, 0, 5, 8, 10, 3, 2, 1, 0, 0, 9, 1, 0, 5, 7, 2, 9, 0),
  case = c("Y", "N", "Y", "Y", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y"),
  cool = c(1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1)
)

# regression model
fit <- multinom(rating ~ count * case + cool, data = df)

It appears that the next step would be to generate data for the prediction.

# new data for prediction
new_data <- expand.grid(
  count = seq(min(df$count), max(df$count), length.out = 100),
  case = unique(df$case),
  cool = unique(df$cool)
)

# simulations
n_sims <- 1000

# simulate predictions
simulate_preds <- function(model, newdata, n) {
  preds <- predict(model, newdata = newdata, type = "probs")
  simulated_preds <- replicate(n, {
    beta_sims <- matrix(rnorm(length(preds), 0, 0.1), ncol = nrow(preds))
    preds + t(beta_sims)
  })
  simulated_preds
}

simulated_probs <- simulate_preds(fit, newdata = new_data, n = n_sims)

# confidence intervals
quantiles <- apply(simulated_probs, c(1, 2), function(x) {
  quantile(x, c(0.025, 0.975))
})

So when I combine the quantiles, I would have three separate sets of upper and lower bounds:

# combine to create new df
new_data <- cbind(new_data, quantiles[1,,], quantiles[2,,])

# rename columns
colnames(new_data) <- c("count", "case", "cool", "Better_Low", "Medium_Low", "Worse_Low", "Better_High", "Medium_High", "Worse_High")

Here's my graph:

ggplot(new_data, aes(x = count, color = factor(cool))) +
  facet_wrap(~ yes) +
  geom_ribbon(aes(ymin = Better_Low, ymax = Better_High)) +
  labs(x = "Count", y = "Predicted Probability", color = "Cool", linetype = "Case") +
  ggtitle("Predicted Probabilities by Rating, Case, and Coolness with Confidence Intervals") +
  theme_bw()

And I repeat this for medium and worse...

But the I add the stat and method commands for the geom_ribbon function, it throws an error:

ggplot(new_data, aes(x = count, color = factor(cool))) +
  facet_wrap(~ yes) +
  geom_ribbon(aes(ymin = Better_Low, ymax = Better_High), alpha = 0.3, stat = "smooth", method = "loess") +
  labs(x = "Count", y = "Predicted Probability", color = "Cool", linetype = "Case") +
  ggtitle("Predicted Probabilities by Rating, Case, and Coolness with Confidence Intervals") +
  theme_bw()

`geom_smooth()` using formula = 'y ~ x'
Error in `geom_ribbon()`:
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error in `compute_layer()`:
! `stat_smooth()` requires the following missing aesthetics: y
Run `rlang::last_trace()` to see where the error occurred.

Solution

  • You could use the marginaleffects package for this. Your example data is too small and weird to produce reasonable-looking plots, but the code below shows the results anyway. If you want to learn more, you can read these two vignettes:

    library(marginaleffects)
    library(nnet)
    
    df <- data.frame(
      rating = c("1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse","1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse"),
      count = c(2, 0, 5, 8, 10, 3, 2, 1, 0, 0, 9, 1, 0, 5, 7, 2, 9, 0),
      case = c("Y", "N", "Y", "Y", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y"),
      cool = c(1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1)
    )
    
    fit <- multinom(rating ~ count * case + cool, data = df, trace = FALSE)
    
    plot_predictions(fit,
        type = "probs",
        condition = c("count", "case", "group")
    )
    

    Disclaimer: I am the marginaleffects author.