Is there an easier way to create a predicted probabilities graph with confidence intervals for a multinomial logistic regression with an interaction term? I have the following code below, which seems like a very long way to do something that I could just do with one line of code for other regression types. I'm also wondering if creating three graphs for each of my y
levels if the best way to represent the results.
Also, I was wondering if there's a way to make the geom_ribbon
to look smoother.
# data frame
df <- data.frame(
rating = c("1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse","1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse"),
count = c(2, 0, 5, 8, 10, 3, 2, 1, 0, 0, 9, 1, 0, 5, 7, 2, 9, 0),
case = c("Y", "N", "Y", "Y", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y"),
cool = c(1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1)
)
# regression model
fit <- multinom(rating ~ count * case + cool, data = df)
It appears that the next step would be to generate data for the prediction.
# new data for prediction
new_data <- expand.grid(
count = seq(min(df$count), max(df$count), length.out = 100),
case = unique(df$case),
cool = unique(df$cool)
)
# simulations
n_sims <- 1000
# simulate predictions
simulate_preds <- function(model, newdata, n) {
preds <- predict(model, newdata = newdata, type = "probs")
simulated_preds <- replicate(n, {
beta_sims <- matrix(rnorm(length(preds), 0, 0.1), ncol = nrow(preds))
preds + t(beta_sims)
})
simulated_preds
}
simulated_probs <- simulate_preds(fit, newdata = new_data, n = n_sims)
# confidence intervals
quantiles <- apply(simulated_probs, c(1, 2), function(x) {
quantile(x, c(0.025, 0.975))
})
So when I combine the quantiles, I would have three separate sets of upper and lower bounds:
# combine to create new df
new_data <- cbind(new_data, quantiles[1,,], quantiles[2,,])
# rename columns
colnames(new_data) <- c("count", "case", "cool", "Better_Low", "Medium_Low", "Worse_Low", "Better_High", "Medium_High", "Worse_High")
Here's my graph:
ggplot(new_data, aes(x = count, color = factor(cool))) +
facet_wrap(~ yes) +
geom_ribbon(aes(ymin = Better_Low, ymax = Better_High)) +
labs(x = "Count", y = "Predicted Probability", color = "Cool", linetype = "Case") +
ggtitle("Predicted Probabilities by Rating, Case, and Coolness with Confidence Intervals") +
theme_bw()
And I repeat this for medium and worse...
But the I add the stat and method commands for the geom_ribbon function, it throws an error:
ggplot(new_data, aes(x = count, color = factor(cool))) +
facet_wrap(~ yes) +
geom_ribbon(aes(ymin = Better_Low, ymax = Better_High), alpha = 0.3, stat = "smooth", method = "loess") +
labs(x = "Count", y = "Predicted Probability", color = "Cool", linetype = "Case") +
ggtitle("Predicted Probabilities by Rating, Case, and Coolness with Confidence Intervals") +
theme_bw()
`geom_smooth()` using formula = 'y ~ x'
Error in `geom_ribbon()`:
! Problem while computing stat.
ℹ Error occurred in the 1st layer.
Caused by error in `compute_layer()`:
! `stat_smooth()` requires the following missing aesthetics: y
Run `rlang::last_trace()` to see where the error occurred.
You could use the marginaleffects
package for this. Your example data is too small and weird to produce reasonable-looking plots, but the code below shows the results anyway. If you want to learn more, you can read these two vignettes:
library(marginaleffects)
library(nnet)
df <- data.frame(
rating = c("1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse","1 Better", "1 Better", "1 Better", "2 Medium", "2 Medium", "2 Medium", "3 Worse", "3 Worse", "3 Worse"),
count = c(2, 0, 5, 8, 10, 3, 2, 1, 0, 0, 9, 1, 0, 5, 7, 2, 9, 0),
case = c("Y", "N", "Y", "Y", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y", "N", "N", "Y", "N", "Y"),
cool = c(1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1)
)
fit <- multinom(rating ~ count * case + cool, data = df, trace = FALSE)
plot_predictions(fit,
type = "probs",
condition = c("count", "case", "group")
)
Disclaimer: I am the marginaleffects
author.