rgammgcvgratia

Modifying plots of the compare_smooths function in gratia package (R)


I am using the draw and compare_smooths functions in the gratia package in R to create plots visualizing smooths from two logistic GAMs. I would like to edit these plots to (1) include raw data; (2) have a logistic transformed y-axis so that y-axis values represent probabilities between 0-1. Is there an easy way to do this using these functions?

Using advice from these posts (Gratia package: probability instead of log odds for plotting generalized additive model (GAM) and Plotting GAMM model results with package gratia) I can successfully plot probabilities instead of log odds for a single model, but am unsure how to do this when using draw on compare_smooths objects. I also am unsure how even for a single model to show the raw data on the plot with the rescaled y axis.

Also, for plotting a single model I know that adding "residuals=T" will achieve what I want, but again, cannot do this using compare_smooths.

Here is some example code



#simulate data
set.seed(9)
df1 <- data.frame(response=c(rep(0,500), rep(1,500)),
                  p1 = c(rnorm(500,100,5),rnorm(500,110,3)),
                  p2 = rnorm(1000,10,2))


df2 <- data.frame(response=sample(c(0,1),1000, replace = T),
                  p1 = rnorm(1000,80,10),
                  p2 = rnorm(1000,15,1))

#create models
m1 <- gam(response ~ s(p1,bs="cr") + s(p2, bs="cr"),
                          data = df1, family=binomial, method = "REML", select=T)
m2 <- gam(response ~ s(p1,bs="cr") + s(p2, bs="cr"),
          data = df2, family=binomial, method = "REML", select=T)

#compare smooths
comp <- compare_smooths(m1,m2)

#visualize compared smooths
draw(comp)

#example of desired y-axis scaling for single model 
#(two different methods, similar results)
draw(smooth_estimates(m1), fun = inv_link(m1))
draw(m1, constant = coef(m1)[1], fun = inv_link(m1))

#example of raw data on plot for single model
draw(m1, residuals=T)


Solution

  • It would be much easier to predict from the respective models, on the response scale, and then plot the resulting fitted values. As this is actually on the response scale (as it includes the model intercept) this is arguably a better plot.

    library("ggplot2")
    library("dplyr")
    library("gratia") # use dev version on github or modify #! ggplot
    
    N <- 100
    r_p1 <- range(c(df1$p1, df2$p1))
    ds <- data_slice(m1, p1 = evenly(p1, lower = r_p1[1], upper = r_p1[2], n = N))
    
    fv_p1_m1 <- fitted_values(m1, data = ds, exclude = "s(p2)")
    fv_p1_m2 <- fitted_values(m2, data = ds, exclude = "s(p2)")
    
    fv_p1 <- fv_p1_m1 |>
      bind_rows(fv_p1_m2) |>
      mutate(.model = rep(c("m1", "m2"), each = N))
    
    fv_p1 |> #! modify variable names in aes() calls below if using CRAN's gratia
      ggplot(aes(x = p1, y = .fitted, group = .model)) +
      geom_ribbon(aes(ymin = .lower_ci, ymax = .upper_ci,
        fill = .model), alpha = 0.2) +
      geom_line(aes(colour = .model))
    

    Not tested; am away from my machine just now.

    If you really don't want the intercept included (so something directly comparable to what compare_smooths() would do, replace

    exclude = "s(p2)"
    

    with

    terms = "s(p1)"
    

    in the fitted_values() calls. For this to work you need to be using a pretty new version of {mgcv} as Simon changed the way terms and exclude worked not that long ago.