rtime-seriesforecast

How to reconstruct Arima predictions from the coefficients?


I have a basic "regression with ARIMA errors" model. It has only two coefficients: one for the covariate and one for an AR1 term.

My client wants me to code the model in Excel (groan).

I can't, however, seem to reproduce the fitted values from the coefficients. Please see this reprex. What am I doing wrong? See how fitted is different from fitted2. Thank you.

library(tidyverse)
library(forecast)
#> Registered S3 method overwritten by 'quantmod':
#>   method            from
#>   as.zoo.data.frame zoo

dat <- structure(list(y = c(1.78991103434018, 1.49879933964446, 0.971918281919186, 
                     0.937808354585886, 1.04805962874144, 0.951319584533088, 0.816977148299322, 
                     0.919107705606102, 0.904677832125957, 0.993996660003134, 1.09903781760707, 
                     1.13452389281912, 1.20954064355947, 1.21259769951234, 1.35795597184542
), x = c(109.523396102789, 82.4545796349974, 22.0326643499854, 
         57.0307677821589, 68.9460376594862, 52.5101835657671, 41.7426519049455, 
         54.9933617116061, 42.9692305014876, 55.5927443079887, 62.0082700546652, 
         56.9579188098397, 58.7357687813304, 61.9314235502785, 78.0286524773229
)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", "data.frame"
))

model <- auto.arima(
  ts(dat$y, frequency = 12),
  xreg = as.matrix(dat[, "x"])
)

summary(model)
#> Series: ts(dat$y, frequency = 12) 
#> Regression with ARIMA(1,0,0) errors 
#> 
#> Coefficients:
#>          ar1  intercept       x
#>       0.8399     0.7652  0.0073
#> s.e.  0.1485     0.1675  0.0012
#> 
#> sigma^2 = 0.0101:  log likelihood = 14.24
#> AIC=-20.49   AICc=-16.49   BIC=-17.66
#> 
#> Training set error measures:
#>                       ME       RMSE        MAE       MPE     MAPE      MASE
#> Training set -0.01916312 0.08988014 0.06145025 -2.555815 5.787479 0.1471733
#>                   ACF1
#> Training set 0.1275706

dat$fitted <- model |> 
  fitted() |> 
  as.numeric()

dat$fitted2 <- NA_real_

for (i in 2:nrow(dat)) {
  dat$fitted2[i] <- 
    coef(model)["intercept"] +
    coef(model)["ar1"] * dat$y[i - 1] +
    coef(model)["x"] * dat$x[i]
}

dat
#> # A tibble: 15 × 4
#>        y     x fitted fitted2
#>    <dbl> <dbl>  <dbl>   <dbl>
#>  1 1.79  110.   1.67    NA   
#>  2 1.50   82.5  1.56     2.87
#>  3 0.972  22.0  1.04     2.19
#>  4 0.938  57.0  1.22     2.00
#>  5 1.05   68.9  1.06     2.06
#>  6 0.951  52.5  0.963    2.03
#>  7 0.817  41.7  0.904    1.87
#>  8 0.919  55.0  0.955    1.85
#>  9 0.905  43.0  0.871    1.85
#> 10 0.994  55.6  1.03     1.93
#> 11 1.10   62.0  1.07     2.05
#> 12 1.13   57.0  1.08     2.11
#> 13 1.21   58.7  1.16     2.15
#> 14 1.21   61.9  1.23     2.24
#> 15 1.36   78.0  1.33     2.36

Created on 2025-07-07 with reprex v2.1.1


Solution

  • You've misunderstood the model. It is a regression with AR(1) errors, not an ARX model. See https://robjhyndman.com/hyndsight/arimax/ for the details.

    The model you have fitted can be written as:

    𝑦𝑡=𝛽0+𝛽1⁢𝑥+𝜂𝑡

    𝜂𝑡=𝜙⁢𝜂𝑡−1+𝜀𝑡

    So the fitted values are 𝛽0 + 𝛽1⁢𝑥 + 𝜙⁢𝜂𝑡-1

    The following code shows the calculation.

    dat |>
      mutate(
        fitted = fitted(model),
        res = residuals(model, type = "regression"),
        fitted2 = coef(model)["intercept"] +
            coef(model)["x"] * x +
            coef(model)["ar1"] * c(NA, res[-15])
      )
    #> # A tibble: 15 × 5
    #>        y     x fitted      res fitted2
    #>    <dbl> <dbl>  <dbl>    <dbl>   <dbl>
    #>  1 1.79  110.   1.67   0.222    NA    
    #>  2 1.50   82.5  1.56   0.129     1.56 
    #>  3 0.972  22.0  1.04   0.0452    1.04 
    #>  4 0.938  57.0  1.22  -0.246     1.22 
    #>  5 1.05   68.9  1.06  -0.223     1.06 
    #>  6 0.951  52.5  0.963 -0.199     0.963
    #>  7 0.817  41.7  0.904 -0.254     0.904
    #>  8 0.919  55.0  0.955 -0.249     0.955
    #>  9 0.905  43.0  0.871 -0.176     0.871
    #> 10 0.994  55.6  1.03  -0.179     1.03 
    #> 11 1.10   62.0  1.07  -0.121     1.07 
    #> 12 1.13   57.0  1.08  -0.0483    1.08 
    #> 13 1.21   58.7  1.16   0.0137    1.16 
    #> 14 1.21   61.9  1.23  -0.00670   1.23 
    #> 15 1.36   78.0  1.33   0.0206    1.33
    

    R actually uses a state space model with a diffuse prior so it is possible to compute the fitted value for the first time point as well, but the calculation is too complicated to easily reproduce in a few lines of code. So I've left it missing.