rmgcvgratia

How to add points to the diagram of GAM results displayed with the plot_smooths() function?


I cannot add points to the diagram of GAM results displayed with the plot_smooths() function of tidymv package.

I would like to add points to the diagram of GAM results displayed with the plot_smooths() function. For example;

library(mgcv)
library(tidymv)
library(gratia)

x<-c(0.17,0.19,0.18,0.33,0.24,0.2,0.2,0.19,0.36,0.29,0.24,0.24,0.21,0.23,0.31,0.36,0.33,0.24,0.31)
y<-c(8475,86,209,8230,8372,8475,8475,2867,149,8410,8161,8474,8451,282,7409,682,504,66,315)
dat<-data.frame(x,y)
gam <- gam(y ~ s(x), data=dat)
plot_smooths(gam, x)

enter image description here I wish to add points to the diagram displayed in this state.

I know that the plot.gam() and daw() functions of the gratia package display points when residuals = TRUE.

plot.gam(gam,residuals = TRUE)
draw(gam, residuals = TRUE)

enter image description here enter image description here

However, in this case, the y-axis values differed from the actual data values. I wanted to plot the values of the data usede. Please let me know if there is a way to plot the data values used in the plot.gam() function or the draw() function.

Thank you in advance.


Solution

  • The easiest way would be to predict from your model to get a nice smooth plot of the estimated smooth function and then add the observed data on top. For example, with gratia::fitted_values() and a little bit of ggplot() we have:

    library("mgcv")
    library("gratia")
    
    x <- c(0.17, 0.19, 0.18, 0.33, 0.24, 0.2, 0.2, 0.19, 0.36, 0.29, 0.24, 0.24, 
      0.21, 0.23, 0.31, 0.36, 0.33, 0.24, 0.31)
    y <- c(8475, 86, 209, 8230, 8372, 8475, 8475, 2867, 149, 8410, 8161, 8474, 
      8451, 282, 7409, 682, 504, 66, 315)
    dat <- data.frame(x, y)
    m <- gam(y ~ s(x), data = dat)
    
    # data to predict at, evenly in x, 100 values
    ds <- data_slice(m, x = evenly(x, n = 100))
    
    # compute fitted values
    fv <- fitted_values(m, data = ds)
    

    fv looks like this:

    > fv
    # A tibble: 100 × 6
        .row     x .fitted   .se .lower_ci .upper_ci
       <int> <dbl>   <dbl> <dbl>     <dbl>     <dbl>
     1     1 0.17    5349. 1770.     1880.     8818.
     2     2 0.172   5348. 1722.     1973.     8723.
     3     3 0.174   5347. 1675.     2064.     8631.
     4     4 0.176   5346. 1630.     2150.     8542.
     5     5 0.178   5345. 1587.     2234.     8456.
     6     6 0.180   5344. 1546.     2314.     8375.
     7     7 0.182   5344. 1507.     2391.     8297.
     8     8 0.183   5344. 1469.     2464.     8223.
     9     9 0.185   5343. 1433.     2534.     8152.
    10    10 0.187   5343. 1399.     2601.     8085.
    # ℹ 90 more rows
    # ℹ Use `print(n = ...)` to see more rows
    

    it is a tibble (fancy data frame) that is suited to plotting with ggplot() and manipulation with dplyr etc.

    Now for the plot

    library("ggplot2")
    
    fv |>
      ggplot(aes(x = x, y = .fitted)) + # setup
      geom_ribbon(                      # add credible interval
        aes(ymax = .upper_ci, ymin = .lower_ci),
        fill = "black",
        alpha = 0.2
      ) +
      geom_line() +                     # add estimated smooth
      geom_point(                       # add your data
        data = dat,
        aes(x = x, y = y)
      )
    

    which produces

    enter image description here

    There are other ways. You could evaluate the smooth (adding on the intercept and a credible interval, and use the draw() method to plot it and then add on the data as above

    m |>
      smooth_estimates(select = "s(x)") |> # evaluate smooth
      add_constant(model_constant(m)) |>   # add on intercept
      add_confint() |>                     # add on credible interval
      draw() +                             # base plot
      geom_point(                          # add your data
        data = dat,
        aes(x = x, y = y)
      )
    

    which gets you a bit more info automagically added to the plot if you like that sort of thing

    enter image description here

    With {gratia} I've focused on two ends of the spectrum

    1. Providing canned draw() methods for the basic plots of smooths, which have some functionality for changing the y-axis scale (via arguments fun and constant in gratia:::draw.smooth_estimates() for example.)
    2. Providing lower level tools so that users can create their own plots with a little ggplot2 knowledge.

    The second example I showed is a bit of a hybrid; I used the draw() method for smooth_estimates(), which adds a few things, but you could almost as easily just plotted the object returned smooth_estimates() with:

    m |>
      smooth_estimates(select = "s(x)") |> # evaluate smooth
      add_constant(model_constant(m)) |>   # add on intercept
      add_confint() |>                     # add on credible interval
      ggplot(aes(x = x, y = .fitted)) +    # setup
      geom_ribbon(                         # add credible interval
        aes(ymax = .upper_ci, ymin = .lower_ci),
        fill = "black",
        alpha = 0.2
      ) +
      geom_line() +                        # add estimated smooth
      geom_point(                          # add your data
        data = dat,
        aes(x = x, y = y)
      )
    

    Once I have got the rest of the package where I want it to be, I do have plans to add more canned plotting functionality and also some ggplot geoms. But I felt it important to first return objects that make it easy to do what the user wants and then build out this other functionality when time permits.