Is it possible to get predictions from a GAM object for specific terms from 'partial' newdata
, which only provides values for the terms to predict? Running predict.gam
with type = 'terms'
for specific terms still seems to require me to provide "complete" newdata:
library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-42. For overview type 'help("mgcv-package")'.
library(data.table)
# some data for training a gam:
train = data.table(response = rnorm(100),
a = rnorm(100),
b = rnorm(100),
d = rnorm(100))
mod = gam(response ~ s(a) + s(b) + s(d),data = train)
newdat = data.table(a = -1:1,b = -1:1)
# this is not possible:
predict(mod,newdata = newdat,type = 'terms',terms = c('s(a)','s(b)'))
#> Warning in predict.gam(mod, newdata = newdat, type = "terms", terms = c("s(a)", : not all required variables have been supplied in newdata!
#> Error in eval(predvars, data, env): object 'd' not found
So the model expects newdata that has values for the predictand d
, even though this is never used:
# adding any value for d works:
predict(mod,newdata = newdat[,d:=1],type = 'terms',terms = c('s(a)','s(b)'))
#> s(a) s(b)
#> 1 -0.14909452 -0.096316246
#> 2 -0.01305186 0.001326293
#> 3 -0.05200030 0.098968833
#> attr(,"constant")
#> (Intercept)
#> -0.0468407
# results do not depend on the value of d:
predict(mod,newdata = newdat[,d:=10000],type = 'terms',terms = c('s(a)','s(b)'))
#> s(a) s(b)
#> 1 -0.14909452 -0.096316246
#> 2 -0.01305186 0.001326293
#> 3 -0.05200030 0.098968833
#> attr(,"constant")
#> (Intercept)
#> -0.0468407
Created on 2024-04-03 with reprex v2.0.2
Specifically, I am working with several big GAMs with many different terms. The number and names of terms varies between the GAMs, but they have some shared terms for which I need to provide newdata. I am looking for a way to do this that does not depend on "the rest of the GAM" (e.g. names and number of terms), which is not actually used in the prediction.
As pointed out by users langtang and Gavin Simpson, you can simply set newdata.guaranteed = TRUE
:
library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-42. For overview type 'help("mgcv-package")'.
library(data.table)
train = data.table(response = rnorm(100),
a = rnorm(100),
b = rnorm(100),
d = rnorm(100))
mod = gam(response ~ s(a) + s(b) + s(d),data = train)
newdat = data.table(a = -1:1,b = -1:1)
predict(mod,newdata = newdat,type = 'terms',terms = c('s(a)','s(b)'),newdata.guaranteed = TRUE)
#> s(a) s(b)
#> 1 0.034319305 -0.32623950
#> 2 0.004482982 0.18215385
#> 3 -0.025353341 -0.04922302
#> attr(,"constant")
#> (Intercept)
#> 0.04415271
Created on 2024-04-04 with reprex v2.0.2