I'm doing a time series project. I fitted a SARIMAX model using the statsmodels statespace SARIMAX function (code below), then I want to simulate some future paths with errors created by me.
The base code for generating a SARIMAX simulation is the following:
from statsmodels.tsa.statespace.sarimax import SARIMAX
model = SARIMAX(time_series, order=order, seasonal_order=seasonal_order,
simple_differencing=False, trend=trend).fit(disp=False, maxiter=3000)
initial = model.predicted_state[:,-1]
simulation = model.simulate(nsimulations=20, anchor='end', repetitions=1000, initial_state=initial)
I now want to insert my errors, which I generated from a normal distribution. I see that the function simulate
takes as input the parameters called measurement_shocks
and state_shocks
. However it's not clear to me how to use them.
My errors are an array with shape (20, 1000), because I want to pass 20 errors (the number of steps) for all the 1000 repetitions, i.e. all the 1000 paths. I pass them as measurement_shocks
. However, this gives me an error telling me I have to pass 20 errors, not 20000 (20*1000). This is strange, since other models (ETSModel
) didn't give me that error and everything went fine with them.
Now I try giving the function 20 errors, thinking I can only create one path at a time:
simulation = model.simulate(nsimulations=20, anchor='end', repetitions=1,
initial_state=initial, measurement_shocks=errors[:,0])
I get some results. However when I rerun the cell (working on Jupyter), the output changes. Didn't expect that given that I give the function the errors. I realized that that's because the state_shocks
are drawn randomly. So I tried fixing the state_shocks to zero (np.zeros((20,1))
), but that gave me an output that didn't take into account my errors.
I've played for almost two weeks with initial_state
, state_shocks
, measurement_shocks
, I've checked every output I obtained from these experiments with the sum of the predicted_mean
and my errors, I've tried implementing several types of simulation in the attempt to recreate the correct output, but I found no info nor solution.
I'm not familiar with the state space formulation, and I don't think I'm ready to delve into it at the moment.
What should I do?
Found the answer, I needed to do things a little differently.
simulation = model.simulate(nsimulations=21, anchor=n_data-1, repetitions=1,
initial_state=model.smoothed_state[:, -1],
measurement_shocks=np.zeros((21, 1)),
state_shocks=np.append(errors[:, 0], 0)
)
This is because I started the simulation not from the last value of my data, but from the previous one, so as to have the last smoothed state as the first simulated value. Then I passed my errors as state shocks (padded with one zero), while I set the measurement shock as zero. The padding is used to maintain coherence with the number of simulated steps.