I fitted a gam (mgcv) where y is modelled as a function of the time (t) by factor x1 and by factor x2 in separate smooth terms. I also included r, which should mimic a random factor.
library(mgcv)
dat <- data.frame(y = rnorm(1500,500,100),
t = rep(c(1:5),100),
x1 = as.factor(rep(letters[1:3],length.out=1500)),
x2 = as.factor(rep(LETTERS[1:5],length.out=1500)),
r = as.factor(rep(letters[4:26],length.out=1500)))
fit <- gam(y~s(t,by=x1,k=4)+s(x1,bs="re")+s(t,by=x2,k=4)+s(x2,bs="re")+s(r,bs="re"),data=dat,
family="gaussian",method="REML")
I now wanted to predict y for a level of x1 across all levels of x2. However, a new level in x2 will lead to NAs in the prediction.
pred <- data.frame(predict(fit,type="response",newdata = data.frame(t=seq(0,5,0.1),
x1="a",
x2="new_level",
r="new_level")
,exclude = c("x2","r"),se.fit = T))
pred$fit
If I set x2 to one of the levels included in x2, it will predict values which will vary depending on the value/level I use even though I specified it to be excluded. Interestingly, the r takes new levels without leading to NAs. So, I figured this is connected to the "by" smooth terms. However, I could not come up with a solution. One might be fitting tow models including only one of the smooth terms w ith by, each? Would that be ok?
It sounds like you just need to specify exclude
properly. For example,
library("gratia")
library("mgcv")
su_eg4 <- data_sim("eg4", n = 400, dist = "normal", scale = 2, seed = 1)
m <- gam(y ~ fac + s(x2, by = fac) + s(x0, by = fac),
data = su_eg4, method = "REML")
This model has the following smooth terms
> smooths(m)
[1] "s(x2):fac1" "s(x2):fac2" "s(x2):fac3" "s(x0):fac1" "s(x0):fac2" "s(x0):fac3"
and hence if you want to exclude all the smooths containing s(x0)
here, you need to use their full names. The factor by
terms are named following this template: s(<x>):<f><level>
, where
<x>
is the smooth covariate(s),<f>
is the name of the factor<level>
is the level of <f>
that this smooth is for.If you have lots of levels, it might get tedious to write all that out, so we can search for the relevant string
sms <- smooths(m)
excl <- grepl("x0", sms)
and then adjust the exclude
argument value accordingly
exclude = sms[excl]
as the sms[excl]
evaluates to
> sms[excl]
[1] "s(x0):fac1" "s(x0):fac2" "s(x0):fac3"
In your case it seems you want to use
sms <- smooths(fit)
excl <- grepl("x2", sms) | grepl("s\\(r\\)", sms)
to capture the random intercept for r
as well as any smooths that involve x2
.
I actually think your model would be more simply specified using:
m <- <- gam(y ~
s(t) + # "average" smooth of t
s(t, x1, bs = "sz") + # smooth difference from `s(t)` for each level of x1
s(t, x2, bs = "sz") + # smooth difference from `s(t)` for each level of x2
s(r, bs = "re"), # random intercept term for `r`
family = "gaussian", method = "REML")
Then you would just be able to use
exclude = c("s(t,x2)", "s(r)")
to exclude the smooths you want to exclude.
Note that with the sz
basis, the group means are included in the basis, hence no parametric effects of x1
or x2
or their ranef equivalents are included in the model shown.