rggplot2gamgratia

How to plot multiple GAMM smooths in one ggplot?


I'm hoping to plot my global effect s(CYR.std), along with my site deviations s(CYR.std, fSite, bs = "fs") term in ggplot. I can plot one at a time, but can't figure out how to include both in the same plot. I'm not sure if I'm specifying things correctly along the way, or how to add the s(CYR.std) term on top of the factor-smooth term, for comparison.

set.seed(12345)

library(mgcv)
library(gratia)

# Hypothetical fish counts from negative binomial distribution
df <- as.data.frame(rnbinom(1000, mu = 0.6971, size = 1))

df$year <- rep(2011:2020, each=100)
df$CYR.std <- df$year - min(df$year)
df$fCYR <- as.factor(df$year)

df$site <- seq(1, 50, 1)
df$fSite <- as.factor(df$site)

df$season <- rep(c("DRY", "WET"), each=50)
df$fSeason <- as.factor(df$season)

# Salinity (continuous covariate)
df$sal <- sample(0.5:40, 1000, replace = TRUE)

names(df)[1] <- "count"

m <- bam(count ~ s(sal) + 
           s(CYR.std) + 
           fSeason + 
           s(CYR.std, by = fSeason) + 
           s(CYR.std, fSite, bs = "fs") + 
           s(fCYR, bs = "re"), 
         method = "fREML",
         discrete = TRUE,
         select = TRUE, 
         family = nb(link = "log"),
         data = df)


# Site deviations from global term
ds <- data_slice(m, CYR.std = evenly(CYR.std),
                 fSite = evenly(fSite))

fv <- fitted_values(m, data = ds, scale = "response") # density

ggplot(fv, aes(x = CYR.std, y = fitted, color = fSite)) +
  # geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2) +
  geom_line()


# Global term
ds2 <- data_slice(m, CYR.std = evenly(CYR.std))

fv2 <- fitted_values(m, data = ds2, scale = "response") # density

ggplot(fv2, aes(x = CYR.std, y = fitted)) +
  geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2) +
  geom_line()

Is this the correct way to specify plotting the 'fs' term and how do I combine both plots?


Solution

  • While Allan's answer is correct from the plotting point of view, the OP has incorrectly specified how the predicted values are generated if the aim is to plot the global effect on top of the site specific effects.

    Here's a corrected version

    library(mgcv)
    library(gratia)
    library(dplyr)
    library(ggplot2)
    

    Your model has the following smooths:

    gratia::smooths(m)
    
    > gratia::smooths(m)
    [1] "s(sal)"                "s(CYR.std)"            "s(CYR.std):fSeasonDRY"
    [4] "s(CYR.std):fSeasonWET" "s(CYR.std,fSite)"      "s(fCYR)"
    

    and there is the parametric fSeason effects.

    When you generate predictions from a GAM you need to provide values for all variables used in the model. But as this model is additive we can leave out the effects of one or more terms. We do this by selecting terms or excluding terms in the call to predict.gam() or gratia::fitted_values().

    If we look at your first data_slice()

    # Site deviations from global term
    ds <- data_slice(m, CYR.std = evenly(CYR.std),
                     fSite = evenly(fSite))
    

    we see that values have been created for all the unstated variables in the the model, including sal and fSeason

    > ds
    # A tibble: 5,000 × 5
       CYR.std fSite fSeason   sal fCYR 
         <dbl> <fct> <fct>   <dbl> <fct>
     1       0 1     DRY      19.5 2011 
     2       0 2     DRY      19.5 2011 
     3       0 3     DRY      19.5 2011 
     4       0 4     DRY      19.5 2011 
     5       0 5     DRY      19.5 2011 
     6       0 6     DRY      19.5 2011 
     7       0 7     DRY      19.5 2011 
     8       0 8     DRY      19.5 2011 
     9       0 9     DRY      19.5 2011 
    10       0 10    DRY      19.5 2011 
    # ℹ 4,990 more rows
    # ℹ Use `print(n = ...)` to see more rows
    

    and the predictions are conditional upon all these values.

    We can remove the effects of everything but s(CYR.std) and s(CYR.std,fSite), but getting these predictions to be agnostic of the season is trickier because of the treatment contrasts; the intercept term will be for the reference season, DRY. Either you'll need to accept this and indicate that these are conditional upon the season, or produce predictions for both seasons. And if you do that, then should you include the factor by smooth effects of CYR.std)?

    Let's assume you don't care that the predictions are for the DRY season. Then to exclude everything but the two required effects we would do:

    fv <- fitted_values(m, data = ds, scale = "response",
      terms = c("(Intercept)", "s(CYR.std)", "s(CYR.std,fSite)"))
    

    That will give only the effects of the two requested smooths and the model constant term.

    Likewise, for the global term only, we need to exclude certain effects:

    # Global term
    ds2 <- data_slice(m, CYR.std = evenly(CYR.std))
    fv2 <- fitted_values(m, data = ds2, scale = "response",
      terms = c("(Intercept)", "s(CYR.std)"))
    

    I'm using terms instead of exclude here as it is easier to select which terms to include rather than to list the ones we want to exclude in this instance.

    Now you can follow Allan's example, thought I have modified it to use the development version of {gratia} as I have modified the names of variables returned:

    ggplot(fv2, aes(x = CYR.std, y = .fitted)) +
      geom_ribbon(aes(ymin = .lower_ci, ymax = .upper_ci), alpha = 0.2) +
      geom_line(data = fv, aes(x = CYR.std, y = .fitted, color = fSite)) +
      geom_line(linewidth = 1, linetype = 2)
    

    This gives

    enter image description here

    Note now how the credible interval that is on the global smooth is actually for this smooth (plus intercept)