datatableforecastinggamforecastr-colnames

How to fit Generalized Additive Model with gam() where always all columns are used as predictors (no hard coding part in model fitting)


I have a train data table in R, which always have different columns, for example now the data table has the following column names:

library(mgcv)
dt.train <- c("DE", "DEWind", "DESolar", "DEConsumption", "DETemperature", 
              "DENuclear", "DELignite")

Now I want to fit a Generalized Additive Model (= GAM) with integrated smoothness estimation that predicts the DE price. At the moment I fit the model as the following:

fitModel <- mgcv::gam(DE ~ s(DEWind)+s(DESolar)+s(DEConsumption)+s(DETemperature)+
                           s(DENuclear)+s(DELignite), 
                      data = dt.train)

The column names are currently hard-coded, but I don't want to change this all the time, I would like to let the program recognize how many columns there are and fit the model with the existing columns. So, I would like to have something like this (which works for stats::lm() or stats::glm()):

fitModel <- mgcv::gam(DE ~ .-1, data = dt.train)

Unfortunately, this doesn't work with gam().


Solution

  • I don't recommend you do this for statistical reasons, but…

    nms <- c("DE", "DEWind", "DESolar", "DEConsumption", "DETemperature", 
                  "DENuclear", "DELignite")
    ## typically you'd get those names as
    ## nms <- names(dt.tain)
    
    ## identify the response
    resp <- 'DE'
    ## filter out response from `nms`
    nms <- nms[nms != resp]
    

    Create the right hand side of the formula, by pasting on the s( and ) bits, and concatenating the strings separated by +:

    rhs <- paste('s(', nms, ')', sep = '', collapse = ' + ')
    

    which gives us

    > rhs
    [1] "s(DEWind) + s(DESolar) + s(DEConsumption) + s(DETemperature) + s(DENuclear) + s(DELignite)"
    

    Then you can add on the response and ~:

    fml <- paste(resp, '~', rhs, collapse = ' ')
    

    which gives

    > fml
    [1] "DE ~ s(DEWind) + s(DESolar) + s(DEConsumption) + s(DETemperature) + s(DENuclear) + s(DELignite)"
    

    Finally coerce to a formula object:

    fml <- as.formula(fml)
    

    which gives

    > fml
    DE ~ s(DEWind) + s(DESolar) + s(DEConsumption) + s(DETemperature) + 
        s(DENuclear) + s(DELignite)