I am fitting GAMs in R using the mgcv
R package. I then want to perform posterior predictive checks to compare observed and simulated outcomes. A toy example using the performance
R package:
library(mgcv)
library(performance)
df <- mtcars |>
tibble::as_tibble() |>
dplyr::mutate(carb = as.integer(carb))
model <- mgcv::gam(
carb~s(wt, k=5),
data = df,
family = mgcv::ocat(R=8)
)
plot(check_model(model))
That does not work:
Error in d[, !(names(data) %in% all.varying), drop = FALSE] :
incorrect number of dimensions
I am trying to get something similar to this:
The only issue seems to be that when the performance
calls simulate
, what it gets is not supported. I can obtain simulated outcomes using mgcViz
:
library(ggplot2)
sims <- mgcViz::simulate.gam(model, newdata=df, nsim=10) |> tibble::as_tibble() |>
tidyr::pivot_longer(dplyr::everything())
ggplot(sims, aes(x = value)) +
geom_point(aes(y = after_stat(count)), stat = "count")
The problem is that I only obtain one point, not the jitter of points as in the image above. How can I achieve that?
EDIT:
After accepting the answer below, I realised that the results of the proposed solution and those of performance
are not the same.
library(mgcv)
library(performance)
library(patchwork)
df <- mtcars |>
tibble::as_tibble() |>
dplyr::mutate(carb = factor(carb, ordered = T))
model <- MASS::polr(
formula = carb ~ wt,
data = df,
Hess = T,
method = "logistic"
)
sims <- simulate(model, newdata = df, nsim = 100) |>
tibble::as_tibble() |>
tidyr::pivot_longer(dplyr::everything(),
names_to = 'replicate',
values_to = 'carb')
sims_summ <- sims |> dplyr::count(carb, replicate)
p1 <- ggplot(mapping = aes(carb)) +
geom_jitter(
aes(y = n, color = 'Model-predicted data'),
sims_summ,
height = 0,
width = 0.1,
alpha = 0.1
) +
stat_summary(
aes(y = n, color = 'Model-predicted data'),
sims_summ,
geom = 'pointrange',
fun.data = 'mean_cl_boot'
) +
geom_point(aes(color = 'Observed data'),
df,
stat = 'count',
size = 5) +
scale_color_manual(values = c('black', 'firebrick')) +
theme_classic()
p2 <- plot(check_predictions(model, iterations = 100))
p1 + p2
performance
does posterior predictive using check_predictions
. Using:
check_predictions(model)
Gives the much more informative error:
Error: Could not simulate responses. Maybe there is no `simulate()` for objects of class `gam`?
As you say, there is mgcViz::simulate.gam
, and loading that package will now find a simulate function. However, it returns data in a different format (an array) than stats::simulate
(which gives a list), and so it will not work with the functions that performance
provides.
You can plot the simulated data and observed data pretty easily yourself though. E.g.:
library(ggplot2)
sims <- mgcViz::simulate.gam(model, newdata=df, nsim=100) |>
tibble::as_tibble() |>
tidyr::pivot_longer(dplyr::everything(), names_to = 'replicate', values_to = 'carb')
ggplot(mapping = aes(carb)) +
geom_density(
aes(color = 'Model-predicted data', group = replicate),
sims, key_glyph = 'smooth'
) +
geom_density(aes(color = 'Observed data'), df, linewidth = 2, key_glyph = 'smooth') +
scale_color_manual(values = c(alpha('black', 0.3), 'firebrick')) +
theme_classic()
Or perhaps like so:
sims_summ <- sims |> dplyr::count(carb, replicate)
ggplot(mapping = aes(carb)) +
geom_jitter(
aes(y = n, color = 'Model-predicted data'),
sims_summ, height = 0, width = 0.1, alpha = 0.1
) +
stat_summary(
aes(y = n, color = 'Model-predicted data'),
sims_summ, geom = 'pointrange', fun.data = 'mean_cl_boot'
) +
geom_point(aes(color = 'Observed data'), df, stat = 'count', size = 5) +
scale_color_manual(values = c('black', 'firebrick')) +
theme_classic()