rrlangfabletidyverts

Dynamically insert variables into a fable model using rlang


I am trying to dynamically insert variables into a fable model.

Data

library(dplyr)
library(fable)
library(stringr)

df <- tsibbledata::aus_retail %>% 
  filter(State == "Victoria", Industry == "Food retailing") %>% 
  mutate(reg_test = rnorm(441, 5, 2),
         reg_test2 = rnorm(441, 5, 2))

Note that there can be an undetermined number of regressors included in the tsibble, but in this example, I have only two (reg_test and reg_test2). All regressor columns will start with reg_

Problem Function

I have a function where I want to dynamically put the regressor columns into an ARIMA model using the fable package.

test_f <- function(df)  {
var_names <- str_subset(names(df), "reg_") %>% 
    paste0(collapse = "+")  
    test <- enquo(var_names)
df %>% 
  model(ARIMA(Turnover ~ !!test))
}

test_f(df)

# A mable: 1 x 3
# Key:     State, Industry [1]
  State    Industry      `ARIMA(Turnover ~ ~"reg_test+reg_tes~
  <chr>    <chr>         <model>                              
1 Victoria Food retaili~ <NULL model>                         
Warning message:
1 error encountered for ARIMA(Turnover ~ ~"reg_test+reg_test2")
[1] invalid model formula in ExtractVars

I know that it is just putting the string var_names into the formula, which does not work, but I can't figure out how to create var_names in such a way that I can enquo() it correctly.

I read through the Quasiquotation section here I searched SO but have not found the answer yet.

This question with pasre_expr() seemed to get closer, but still not what I wanted.

I know that I can use sym() if I have one variable, but I don't know how many reg_ variables there will be and I want to include them all.

Expected Output

By putting in the variables manually, I can show the output that I expect.

test <- df %>% 
  model(ARIMA(Turnover ~ reg_test + reg_test2))
test$`ARIMA(Turnover ~ reg_test + reg_test2)`[[1]]

Series: Turnover 
Model: LM w/ ARIMA(2,1,0)(0,1,2)[12] errors 

Coefficients:
          ar1      ar2     sma1     sma2  reg_test  reg_test2
      -0.6472  -0.3541  -0.4115  -0.0793   -0.0296    -0.6143
s.e.   0.0473   0.0479   0.0520   0.0446    0.5045     0.5273

sigma^2 estimated as 884.9:  log likelihood=-2058.04
AIC=4130.08   AICc=4130.35   BIC=4158.5

I also imagine that there is a better way for me to make the formula in the ARIMA function. If this can fix my problem as well, that will work too.

I appreciate any help!


Solution

  • You're possibly making this a bit more complicated than it needs to be. You can convert a string to a formula by doing as.formula(string), so simply build your formula as a string, convert it to a formula, then feed it to ARIMA. Here's a reprex:

    library(dplyr)
    library(fable)
    library(stringr)
    
    df <- tsibbledata::aus_retail %>% 
      filter(State == "Victoria", Industry == "Food retailing") %>% 
      mutate(reg_test = rnorm(441, 5, 2),
             reg_test2 = rnorm(441, 5, 2))
    
    test_f <- function(df)  {
        var_names <- paste0(str_subset(names(df), "reg_"), collapse = " + ")
        mod <- model(df, ARIMA(as.formula(paste("Turnover ~", var_names))))
        unclass(mod[1, 3][[1]])[[1]]
    }
    
    test_f(df)
    #> Series: Turnover 
    #> Model: LM w/ ARIMA(2,1,0)(0,1,1)[12] errors 
    #> 
    #> Coefficients:
    #>           ar1     ar2     sma1  reg_test  reg_test2
    #>       -0.6689  -0.376  -0.4765    0.3363     1.0194
    #> s.e.   0.0448   0.045   0.0426    0.4978     0.5436
    #> 
    #> sigma^2 estimated as 883.1:  log likelihood=-2058.28
    #> AIC=4128.56   AICc=4128.76   BIC=4152.91
    

    Created on 2020-04-23 by the reprex package (v0.3.0)