Suppose I estimate using the fable package the following model using daily data that cover 2019, where x is an exogenous explanatory variable. The terms pdq(p = 1, d = 0, q = 0)
and PDQ(P = 0, D = 0, Q = 0)
mean that this is an auto-regressive model.
library(tidyverse)
library(fable)
load(file, "Some data.RData")
fit <- dta_2019 %>%
tsibble() %>%
model(ar = ARIMA(y ~ x + pdq(p = 1, d = 0, q = 0) + PDQ(P = 0, D = 0, Q = 0)))
Now I need to use that model to run a forecast on daily data for the year 2020 but suppose that the data begin in February 2020.
forecast_2020 <- fit %>%
forecast(new_data = tsibble(dta_2020))
My understanding is that the value of the lag of y for the forecast, which is required given that this is an auto-regressive model, will be the last value observed in the estimation dataset (dta_2019). Can I initialize the value of y to something else? I have tried including a row in dta_2020 that contains, in this particular example, the observation for January 31st, but that causes the forecast to begin on January 31st.
For the ARIMA model (using fable::ARIMA()
) you will also need to forecast January 2020 to obtain the February 2020 forecasts of interest. If the exogenous regressor is available for both months, then the forecasts can be computed. Providing future values of your exogenous regressor x
is required, but future values of y
are not needed for forecasting.