time-seriescross-validationfabletidyverts

selecting lagged predictors with TSLM using AICc


I am trying to determine lagged predictors to include in my time series model. So I fitted a TSLM with up to lag 3 of the independent variable

lag_models <- data_train %>% model(
    ts_lag_0 = TSLM(Y ~ X)
  , ts_lag_1 = TSLM(Y ~ X + lag_X_01)
  , ts_lag_2 = TSLM(Y ~ X + lag_X_01 + lag_X_02)
  , ts_lag_3 = TSLM(Y ~ X + lag_X_01 + lag_X_02 + lag_X_03)
 )

data_train contains cross-validation data.

lag_models %>% glance()

Running the code above, I get AIC, AICc, BIC, etc. by lagged predictor model by .id. I am wondering if it's possible to pull out these metrics by model by only the model without using group_by() and summarize().

Thanks very much.


Solution

  • When using cross validation, you are estimating a model on every fold/slice of the data. As a result, you will receive set of summary statistics (AIC, AICc, BIC, etc.) for every estimated model. If you were to combine them using group_by() and summarise(), you would be combining summary information from models with different response data - this isn't recommended as information criterion are not comparable when the response data varies.

    If you wanted to compare the performance of each of the models using cross-validation, you can use out-of-sample accuracy measures using accuracy(). Examples of using fable for cross-validated accuracy evaluation can be found at https://otexts.com/fpp3/tscv.html