I'm hoping to plot my global effect s(CYR.std)
, along with my site deviations s(CYR.std, fSite, bs = "fs")
term in ggplot. I can plot one at a time, but can't figure out how to include both in the same plot. I'm not sure if I'm specifying things correctly along the way, or how to add the s(CYR.std)
term on top of the factor-smooth term, for comparison.
set.seed(12345)
library(mgcv)
library(gratia)
# Hypothetical fish counts from negative binomial distribution
df <- as.data.frame(rnbinom(1000, mu = 0.6971, size = 1))
df$year <- rep(2011:2020, each=100)
df$CYR.std <- df$year - min(df$year)
df$fCYR <- as.factor(df$year)
df$site <- seq(1, 50, 1)
df$fSite <- as.factor(df$site)
df$season <- rep(c("DRY", "WET"), each=50)
df$fSeason <- as.factor(df$season)
# Salinity (continuous covariate)
df$sal <- sample(0.5:40, 1000, replace = TRUE)
names(df)[1] <- "count"
m <- bam(count ~ s(sal) +
s(CYR.std) +
fSeason +
s(CYR.std, by = fSeason) +
s(CYR.std, fSite, bs = "fs") +
s(fCYR, bs = "re"),
method = "fREML",
discrete = TRUE,
select = TRUE,
family = nb(link = "log"),
data = df)
# Site deviations from global term
ds <- data_slice(m, CYR.std = evenly(CYR.std),
fSite = evenly(fSite))
fv <- fitted_values(m, data = ds, scale = "response") # density
ggplot(fv, aes(x = CYR.std, y = fitted, color = fSite)) +
# geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2) +
geom_line()
# Global term
ds2 <- data_slice(m, CYR.std = evenly(CYR.std))
fv2 <- fitted_values(m, data = ds2, scale = "response") # density
ggplot(fv2, aes(x = CYR.std, y = fitted)) +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2) +
geom_line()
Is this the correct way to specify plotting the 'fs' term and how do I combine both plots?
While Allan's answer is correct from the plotting point of view, the OP has incorrectly specified how the predicted values are generated if the aim is to plot the global effect on top of the site specific effects.
Here's a corrected version
library(mgcv)
library(gratia)
library(dplyr)
library(ggplot2)
Your model has the following smooths:
gratia::smooths(m)
> gratia::smooths(m)
[1] "s(sal)" "s(CYR.std)" "s(CYR.std):fSeasonDRY"
[4] "s(CYR.std):fSeasonWET" "s(CYR.std,fSite)" "s(fCYR)"
and there is the parametric fSeason
effects.
When you generate predictions from a GAM you need to provide values for all variables used in the model. But as this model is additive we can leave out the effects of one or more terms. We do this by selecting terms or excluding terms in the call to predict.gam()
or gratia::fitted_values()
.
If we look at your first data_slice()
# Site deviations from global term
ds <- data_slice(m, CYR.std = evenly(CYR.std),
fSite = evenly(fSite))
we see that values have been created for all the unstated variables in the the model, including sal
and fSeason
> ds
# A tibble: 5,000 × 5
CYR.std fSite fSeason sal fCYR
<dbl> <fct> <fct> <dbl> <fct>
1 0 1 DRY 19.5 2011
2 0 2 DRY 19.5 2011
3 0 3 DRY 19.5 2011
4 0 4 DRY 19.5 2011
5 0 5 DRY 19.5 2011
6 0 6 DRY 19.5 2011
7 0 7 DRY 19.5 2011
8 0 8 DRY 19.5 2011
9 0 9 DRY 19.5 2011
10 0 10 DRY 19.5 2011
# ℹ 4,990 more rows
# ℹ Use `print(n = ...)` to see more rows
and the predictions are conditional upon all these values.
We can remove the effects of everything but s(CYR.std)
and s(CYR.std,fSite)
, but getting these predictions to be agnostic of the season is trickier because of the treatment contrasts; the intercept term will be for the reference season, DRY
. Either you'll need to accept this and indicate that these are conditional upon the season, or produce predictions for both seasons. And if you do that, then should you include the factor by smooth effects of CYR.std)
?
Let's assume you don't care that the predictions are for the DRY
season. Then to exclude everything but the two required effects we would do:
fv <- fitted_values(m, data = ds, scale = "response",
terms = c("(Intercept)", "s(CYR.std)", "s(CYR.std,fSite)"))
That will give only the effects of the two requested smooths and the model constant term.
Likewise, for the global term only, we need to exclude certain effects:
# Global term
ds2 <- data_slice(m, CYR.std = evenly(CYR.std))
fv2 <- fitted_values(m, data = ds2, scale = "response",
terms = c("(Intercept)", "s(CYR.std)"))
I'm using terms
instead of exclude
here as it is easier to select which terms to include rather than to list the ones we want to exclude in this instance.
Now you can follow Allan's example, thought I have modified it to use the development version of {gratia} as I have modified the names of variables returned:
ggplot(fv2, aes(x = CYR.std, y = .fitted)) +
geom_ribbon(aes(ymin = .lower_ci, ymax = .upper_ci), alpha = 0.2) +
geom_line(data = fv, aes(x = CYR.std, y = .fitted, color = fSite)) +
geom_line(linewidth = 1, linetype = 2)
This gives
Note now how the credible interval that is on the global smooth is actually for this smooth (plus intercept)