rforecastingarimaforecast

How does R forecast package treat missing values in ARIMA (auto.arima function)


I run an ARIMA model in R on the data with missing values. It is financial data, so the missings are either days on public holiday or weekends, so not completely at random. I am still thinking which decision I should make on missing values.

However, what I see is that the function itself runs without error. So ARIMA automatically does it treat missing values somehow. But I can't find in documentations what exactly happens with missing values when running ARIMA (just to know whether it drops them/imputes or smth else?)

best_fit = auto.arima(data_vector, stationary = is_stationary, ic = "bic", stepwise = FALSE, allowmean = TRUE, allowdrift = TRUE, approximation = FALSE)

Does somebody know what auto.arima does by default?


Solution

  • forecast::auto.arima() uses stats::arima() to fit the models. This uses a state space approach to ARIMA models, and computes the likelihood using the Kalman Filter. See the help file for stats::arima() which contains an explanation, with several references. In particular, Jones (1980) explains how the missing values are handled in a Kalman Filter. So it does not drop them, or impute them. It simply computes the likelihood based on the available data by skipping the updating part of the Kalman Filter.