I have a ggplot where I have used the geom_smooth(method='gam') function. I was wondering if there is a way to extract model parameters, like coefficients and deviance explained.
Similar to the summary() of mgcv::gam()'s output:
> summary(gam)
Family: gaussian
Link function: identity
Formula:
mAODscale ~ s(numDate, bs = "cr")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.041461 0.002198 18.86 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(numDate) 8.731 8.979 74.54 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.379 Deviance explained = 38.4%
GCV = 0.0053239 Scale est. = 0.0052765 n = 1092
test <- structure(list(numDate = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4,
5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10, 10, 10, 11,
11, 11, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15, 15, 15, 16, 16,
16, 17, 17, 17, 18, 18, 18, 19, 19, 19, 20, 20, 20, 21, 21, 21,
22, 22, 22, 23, 23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 26, 27,
27, 27, 28, 28, 28, 29, 29, 29, 30, 30, 30, 31, 31, 31, 32, 32,
32, 33, 33, 33, 34), mAODscale = c(0.0388813764134284, 0.0461877433707656,
0.096886995092774, 0.0371438764134382, 0.0394419100374392, 0.0893994950927635,
0.0413855430800965, 0.0418085767040992, 0.101328661759439, 0.0428313764134316,
0.0403835767041016, 0.0978328284261067, 0.0444813764134295, 0.0566460767040979,
0.136811995092771, 0.0404647097467716, 0.0461127433707702, 0.104611995092768,
0.0391105430800991, 0.0388294100374367, 0.0883828284261057, 0.0377355430801032,
0.0334335767040983, 0.0744786617594428, 0.0346647097467638, 0.0316752433707705,
0.0691911617594343, 0.0365730430800966, 0.0329127433707725, 0.0721203284261094,
0.0337897097467703, 0.0271960767041008, 0.0568703284261005, 0.0321188764134348,
0.0226835767040967, 0.0467286617594311, 0.0389522097467676, 0.0317169100374315,
0.0724911617594444, 0.0374147097467699, 0.0301335767041024, 0.066378661759444,
0.0359688764134347, 0.0245710767041061, 0.0492828284261009, 0.0340355430801083,
0.0208835767040938, 0.0401828284261114, 0.033193876413435, 0.0170627433707722,
0.0363119950927739, 0.0320272097467722, 0.0147294100374324, 0.0298161617594417,
0.0305563764134433, 0.0123752433707693, 0.0200744950927714, 0.0294522097467649,
0.00966691003743847, 0.0144453284261061, 0.029439709746768, 0.00845441003743019,
0.0127494950927769, 0.029218876413438, 0.00767941003742578, 0.0118203284260971,
0.0283438764134303, 0.00608774337077023, 0.00807032842610056,
0.027393235387791, 0.00498582029383954, 0.00612513611841337,
0.0261313764134314, 0.004612743370771, 0.00541616175944171, 0.0260813764134298,
0.00472524337077118, 0.00372449509276862, 0.0256522097467666,
0.00535024337077061, 0.00356616175943714, 0.0250313764134376,
0.0161044100374284, 0.00060366175944182, 0.0241147097467689,
0.025246076704093, -0.00540050490722876, 0.0233022097467739,
0.022454410037426, -0.00630883824055672, 0.0208772097467715,
0.0152252433707645, -0.00819217157389573, 0.0194897097467646,
-0.00307475662923196, -0.00896300490722979, 0.0178938764134386,
0.00323774337076088, -0.00836717157389444, 0.0153022097467641
)), row.names = c(NA, -100L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x000002cacef91f60>)
Note: this is only using a subset of my dataset.
p <- ggplot(test50, aes(x=numDate, y=mAODscale)) +
geom_point() +
stat_smooth(method='gam',
formula=y~s(x,bs="cs",fx=TRUE,k=10))
When specifying a geom_smooth
model within ggplot
, you use the symbols x
and y
in the formula to represent the variables mapped to the x and y aesthetics (as specified in aes
). In your case, x
is numDate
and y
is mAODscale
.
Since there are no transformations going on in the data prior to the model being run (other than this change in the variable names), all you need to do is substitute the symbols x
and y
in the formula for the appropriate variable names in your original data frame. Then use that formula to run a gam
. The summary of this model is the result you are looking for:
library(mgcv)
mod <- gam(mAODscale ~ s(numDate, bs = "cs", fx = TRUE, k = 10), data = test)
summary(mod)
#>
#> Family: gaussian
#> Link function: identity
#>
#> Formula:
#> mAODscale ~ s(numDate, bs = "cs", fx = TRUE, k = 10)
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.031871 0.001839 17.33 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(numDate) 9 9 13.25 3.01e-13 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.527 Deviance explained = 57%
#> GCV = 0.00037584 Scale est. = 0.00033826 n = 100
The model itself is not stored in the ggplot object. It is built on-the-fly and discarded during the ggplot_build
process. We can see that a direct call to gam
gives the same result as the model used in ggplot
if we set a trace
on gam
and get it to print out a summary of its output object to the console whenenver it is called:
trace(gam, quote(print(summary(object))), at = 40)
#> Tracing function "gam" in package "mgcv"
#> [1] "gam"
Now when we print our ggplot, we should get a summary of the gam
that ggplot
uses to create the smooth:
ggplot(test, aes(x=numDate, y=mAODscale)) +
geom_point() +
stat_smooth(method='gam',
formula=y ~ s(x, bs = "cs", fx = TRUE, k = 10))
#> Tracing method(formula, data = data, weights = weight, method = "REML") step 40
#>
#> Family: gaussian
#> Link function: identity
#>
#> Formula:
#> y ~ s(x, bs = "cs", fx = TRUE, k = 10)
#>
#> Parametric coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.031871 0.001839 17.33 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Approximate significance of smooth terms:
#> edf Ref.df F p-value
#> s(x) 9 9 13.25 3.01e-13 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> R-sq.(adj) = 0.527 Deviance explained = 57%
#> -REML = -220.56 Scale est. = 0.00033826 n = 100
You can see that this is the same as our direct call, except that the variable names have changed to x
and y
.
Don't forget to untrace(gam)
when you are done.
Created on 2023-10-03 with reprex v2.0.2