I am trying to forecast out-of-sample value using sktime SquaringResiduals. Here is the code which working well for in-sample prediction.
from sktime.forecasting.arch import StatsForecastGARCH
from sktime.forecasting.squaring_residuals import SquaringResiduals
def hybridModel(p,q,model):
out_sample_date = FH(np.arange(12), is_relative=True)
in_sample_date = FH(df.index, is_relative=False)
var_fc = StatsForecastGARCH(p=p,q=q)
sqr = SquaringResiduals(forecaster=model, residual_forecaster=var_fc,initial_window=int(len(df)))
sqr = sqr.fit(df, fh=in_sample_date)
# y_pred2 = sqr.predict(out_sample_date) #out sample prediction
y_pred = sqr.predict(in_sample_date) #in sample prediction
fig,ax=plot_series(df, y_pred, labels=["passenger", "y_pred"])
return sqr,fig,y_pred,error_matrix(df,y_pred)
sqr,fig1,y_pred1,matrix1= hybridModel(1,1,forecaster)
Now I try to forecast out-sample.
y_pred2 = sqr.predict(out_sample_date) #out sample prediction
> ValueError: A different forecasting horizon `fh` has been provided
> from the one seen already in `fit`, in this instance of
> SquaringResiduals. If you want to change the forecasting horizon,
> please re-fit the forecaster. This is because the fitting of the
> forecaster SquaringResiduals depends on `fh`.
So I change:sqr = sqr.fit(df, fh=in_sample_date)
to sqr = sqr.fit(df)
> ValueError: The forecasting horizon `fh` must be passed to `fit` of
> SquaringResiduals, but none was found. This is because fitting of the
> forecaster SquaringResiduals depends on `fh`.
Then I change: sqr = sqr.fit(df, fh=in_sample_date)
to sqr = sqr.fit(df, fh=out_sample_date)
> ValueError: The `window_length` and the forecasting horizon are
> incompatible with the length of `y`. Found `window_length`=84,
> `max(fh)`=11, but len(y)=84. It is required that the window length
> plus maximum forecast horizon is smaller than the length of the time
> series `y` itself.
Then I checked predict function for other model, and predict() function working well for both in-sample and out-sample prediction for non-hybrid model:
from sktime.forecasting.tbats import TBATS
from sktime.forecasting.base import ForecastingHorizon as FH
import warnings
import numpy as np
import pandas as pd
import mlflow
from sktime.utils import mlflow_sktime as mf
from sktime.utils.plotting import plot_series
warnings.filterwarnings("ignore")
out_sample_date = FH(np.arange(12), is_relative=True)
in_sample_date = FH(df.index, is_relative=False)
forecaster = TBATS(
use_box_cox=True,
use_trend=True,
use_damped_trend=True,
sp=12,
use_arma_errors=True,
n_jobs=1)
forecaster.fit(df)
y_pred = forecaster.predict(in_sample_date)
y_pred2 = forecaster.predict(out_sample_date)
fig,ax = plot_series(df,y_pred,y_pred2,labels=['passenger','prediction','out_sample_pred'])
Why out-sample / in-sample prediction function does not work together for SquaringResiduals and how can we predict out-sample / in-sample value?
sqr = SquaringResiduals(forecaster=model, residual_forecaster=var_fc,initial_window=int(len(df)))
sqr = sqr.fit(df, fh=in_sample_date)
y_pred2 = sqr.predict(out_sample_date) #out sample prediction
Thank you so much for your attention.
The documentation explains that the forecaster is trained on y(t_1),...y(t_i)
where i = initial_window, ... N-steps_ahead
, and that this is used to calculate the residual r(t_i+steps_ahead) := y(t_i+steps_ahead) - ŷ(t_i+steps_ahead)
for each value of i.
The initial_window
must be less than or equal to N-steps_ahead
to make any forecasts for a positive number of steps_ahead
. I believe the reason for this is if we consider initial_window = N-s
where s is greater than or equal to 0, and steps_ahead=a
, then in the first iteration of the loop over i, we get:
r(t_i+steps_ahead) := y(t_i+steps_ahead) - ŷ(t_i+steps_ahead)
r(t_(N-s+a)) := y(t_(N-s+a)) - ŷ(t_(N-s+a))
Notice that y(t_(N-s+a))
is not known unless N-s+a <= N
, or equivalently a < s
because we don't know the true value of future timestamps.
This means when you use SquaringResiduals, the maximum possible initial window you can supply is max_initial_window = len(df)-max(out_sample_date)
. Notice that we are using max(out_sample_date) and not len(out_sample_date) because np.arange(12)
only asks for forecasts of steps_ahead = 0, ... 11
or a maximum forecast horizon of 11.
Below is a fully reproducible example:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sktime.forecasting.arch import StatsForecastGARCH
from sktime.forecasting.squaring_residuals import SquaringResiduals
from sktime.forecasting.tbats import TBATS
from sktime.forecasting.base import ForecastingHorizon as FH
import warnings
import mlflow
from sktime.utils import mlflow_sktime as mf
from sktime.utils.plotting import plot_series
warnings.filterwarnings("ignore")
## make up some random data
np.random.seed(42)
dates = pd.date_range(start='2012-01-01',end='2019-01-01',freq='1M')
passengers = 40 + 10*np.sin(np.linspace(-np.pi, np.pi, len(dates))) + np.random.normal(loc=0, scale=2, size=len(dates))
df = pd.DataFrame(data={"passenger": passengers}, index=pd.PeriodIndex(data=dates, freq='M'))
def hybridModel(p,q,model):
out_sample_date = FH(np.arange(12), is_relative=True)
in_sample_date = FH(df.index, is_relative=False)
max_initial_window = len(df)-max(out_sample_date) ## <-- max initial window cannot be any larger!
var_fc = StatsForecastGARCH(p=p,q=q)
sqr = SquaringResiduals(forecaster=model, residual_forecaster=var_fc, initial_window=max_initial_window)
sqr = sqr.fit(df, fh=in_sample_date)
y_pred = sqr.predict(in_sample_date) #in sample prediction
sqr = sqr.fit(df, fh=out_sample_date)
y_pred2 = sqr.predict(out_sample_date) #out sample prediction
fig,ax=plot_series(df, y_pred, y_pred2, labels=["passenger", "y_pred", "y_pred2"])
plt.plot()
return sqr,fig,y_pred
forecaster = TBATS(
use_box_cox=True,
use_trend=True,
use_damped_trend=True,
sp=12,
use_arma_errors=True,
n_jobs=1)
forecaster.fit(df)
sqr,fig1,y_pred1= hybridModel(1,1,forecaster)
fig1.show()