I am looking for a Python library or example that produces a set of prediction (not confidence, as I am predicting future values ) intervals for time series analysis. I have code that will read a CSV file containing two fields: a date field and a value field.
Date Value
Dec-21 19.80
Jan-22 19.80
Feb-22 19.70
Mar-22 20.00
Apr-22 19.90
May-22 20.00
Jun-22 20.00
Jul-22 20.00
Aug-22 20.00
Sep-22 20.10
Oct-22 20.00
Nov-22 20.10
Dec-22 20.00
Jan-23 20.20
Feb-23 20.30
Mar-23 20.30
Apr-23 20.50
May-23 20.40
Jun-23 20.40
Jul-23 20.60
Aug-23 20.50
Sep-23 20.62
Oct-23 20.64
Nov-23 20.65
Dec-23 20.78
Jan-24 20.74
Feb-24 20.81
Mar-24 20.90
Apr-24 20.85
May-24 21.00
Jun-24 20.97
Jul-24 21.04
Aug-24 21.13
Sep-24 21.09
Oct-24 21.22
Nov-24 21.21
Dec-24 21.25
That code will run an ARIMA model and produce a series of confidence intervals:
import warnings
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter('ignore', ConvergenceWarning)
import warnings
warnings.filterwarnings("ignore")
from statsmodels.tsa.arima.model import ARIMA
from pandas import read_csv
summarize multiple confidence intervals on an ARIMA forecast for Diverse %
from pandas import read_csv
from statsmodels.tsa.arima.model import ARIMA
# load data
series = read_csv("C:\\mydirectory\\myfile.csv", header=0, index_col=0, parse_dates=True, squeeze=True)
# split data into train and test setes
X = series.values
X = X.astype('float32')
size = len(X) - 1
train, test = X[0:size], X[size:]
# fit an ARIMA model
model = ARIMA(train, order=(5,1,1))
model_fit = model.fit()
result = model_fit.get_forecast(maxiter=200)
forecast = result.predicted_mean
# summarize confidence intervals
intervals = [0.95,0.90,0.85,0.80,0.75,0.70,0.65, 0.60,0.55,0.50,0.45,0.40,0.35,0.30,0.25,0.2, 0.1, 0.05, 0.01]
for a in intervals:
ci = result.conf_int(alpha=a)
print('%.1f%% Confidence Interval: %.3f between %.3f and %.3f' % ((1-a)*100, forecast, ci[0,0], ci[0,1]))
...and it returns the confidence intervals:
5.0% Confidence Interval: 21.252 between 21.247 and 21.256
10.0% Confidence Interval: 21.252 between 21.243 and 21.260
15.0% Confidence Interval: 21.252 between 21.239 and 21.264
20.0% Confidence Interval: 21.252 between 21.234 and 21.269
25.0% Confidence Interval: 21.252 between 21.230 and 21.273
30.0% Confidence Interval: 21.252 between 21.225 and 21.278
35.0% Confidence Interval: 21.252 between 21.220 and 21.283
40.0% Confidence Interval: 21.252 between 21.216 and 21.287
45.0% Confidence Interval: 21.252 between 21.211 and 21.292
50.0% Confidence Interval: 21.252 between 21.205 and 21.298
55.0% Confidence Interval: 21.252 between 21.200 and 21.303
60.0% Confidence Interval: 21.252 between 21.194 and 21.309
65.0% Confidence Interval: 21.252 between 21.188 and 21.316
70.0% Confidence Interval: 21.252 between 21.181 and 21.323
75.0% Confidence Interval: 21.252 between 21.173 and 21.330
80.0% Confidence Interval: 21.252 between 21.164 and 21.339
90.0% Confidence Interval: 21.252 between 21.139 and 21.364
95.0% Confidence Interval: 21.252 between 21.117 and 21.386
99.0% Confidence Interval: 21.252 between 21.075 and 21.428
What I am looking for are the prediction intervals at 5%, 10%...90%.
I have tried running this updated code:
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
# Load data
data = pd.read_csv("C:\\mydirectory\\myfile.csv", header=0, parse_dates=True, index_col=0)
# Split data into train and test sets
train_size = int(len(data) * 0.8)
train, test = data.iloc[:train_size], data.iloc[train_size:]
# Fit an ARIMA model
model = ARIMA(train, order=(5, 1, 1))
model_fit = model.fit()
# Forecast future values
n_forecast = len(test)
forecast, stderr, conf_int = model_fit.forecast(steps=n_forecast, alpha=0.05)
# Plot the actual vs. predicted values with prediction intervals
plt.figure(figsize=(12, 6))
plt.plot(train.index, train.values, label='Training Data', color='blue')
plt.plot(test.index, test.values, label='Actual Data', color='green')
plt.plot(test.index, forecast, label='Predicted Data', color='red')
# Fill prediction intervals
plt.fill_between(test.index, conf_int[:, 0], conf_int[:, 1], color='pink', alpha=0.3, label='Prediction Intervals')
plt.title('Time Series Forecast with Prediction Intervals')
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend(loc='upper left')
plt.grid(True)
plt.show()
# Print prediction intervals
prediction_intervals = [0.05, 0.10, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, 0.65, 0.70, 0.75, 0.80, 0.85, 0.90, 0.95]
for alpha in prediction_intervals:
z_score = model_fit.get_forecast(steps=n_forecast).zconfint(alpha=alpha)
lower_bound = z_score[:, 0]
upper_bound = z_score[:, 1]
print('%.1f%% Prediction Interval: %.3f between %.3f and %.3f' % ((1 - alpha) * 100, forecast[0], lower_bound[0], upper_bound[0]))
which returns an error :
ValueError Traceback (most recent call last)
Cell In[27], line 3
1 # Forecast future values
2 n_forecast = len(test)
----> 3 forecast, stderr, conf_int = model_fit.forecast(steps=n_forecast, alpha=0.05)
ValueError: too many values to unpack (expected 3)
Please advise. Thanks.
Based on the documentation , the forecast
method only returns a single NumPy array, Pandas Series, or Pandas DataFrame depending on the dimensions and inputs. Your code is expecting an iterable (tuple, numpy array, or other) with only 3 elements.