I have a train data table in R
, which always have different columns, for example now the data table has the following column names:
library(mgcv)
dt.train <- c("DE", "DEWind", "DESolar", "DEConsumption", "DETemperature",
"DENuclear", "DELignite")
Now I want to fit a Generalized Additive Model (= GAM) with integrated smoothness estimation that predicts the DE
price. At the moment I fit the model as the following:
fitModel <- mgcv::gam(DE ~ s(DEWind)+s(DESolar)+s(DEConsumption)+s(DETemperature)+
s(DENuclear)+s(DELignite),
data = dt.train)
The column names are currently hard-coded, but I don't want to change this all the time, I would like to let the program recognize how many columns there are and fit the model with the existing columns. So, I would like to have something like this (which works for stats::lm()
or stats::glm()
):
fitModel <- mgcv::gam(DE ~ .-1, data = dt.train)
Unfortunately, this doesn't work with gam()
.
I don't recommend you do this for statistical reasons, but…
nms <- c("DE", "DEWind", "DESolar", "DEConsumption", "DETemperature",
"DENuclear", "DELignite")
## typically you'd get those names as
## nms <- names(dt.tain)
## identify the response
resp <- 'DE'
## filter out response from `nms`
nms <- nms[nms != resp]
Create the right hand side of the formula, by pasting on the s(
and )
bits, and concatenating the strings separated by +
:
rhs <- paste('s(', nms, ')', sep = '', collapse = ' + ')
which gives us
> rhs
[1] "s(DEWind) + s(DESolar) + s(DEConsumption) + s(DETemperature) + s(DENuclear) + s(DELignite)"
Then you can add on the response and ~
:
fml <- paste(resp, '~', rhs, collapse = ' ')
which gives
> fml
[1] "DE ~ s(DEWind) + s(DESolar) + s(DEConsumption) + s(DETemperature) + s(DENuclear) + s(DELignite)"
Finally coerce to a formula object:
fml <- as.formula(fml)
which gives
> fml
DE ~ s(DEWind) + s(DESolar) + s(DEConsumption) + s(DETemperature) +
s(DENuclear) + s(DELignite)