I'm using the SVR-GARCH model to predict conditional volatility, as described in the book Machine Learning for Financial Risk Management with Python: Algorithms for Modeling Risk by Abdullah Karasan.
I’ve encountered an issue where my code sometimes produces the same repeated value for the conditional volatility across the entire forecasting horizon. I understand that the initial parameter values are randomized, but I am confused about why, in most cases, the prediction results in a constant value throughout the forecast period.
import yfinance as yf
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
from sklearn.svm import SVR
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform as sp_rand
from sklearn.preprocessing import StandardScaler
# Select assets
stock_name = ['AAPL']
end_date = datetime.today()
start_date = end_date - timedelta(days = 365 * 25)
# Download the prices
prices = yf.download(
stock_name,
start = start_date,
end = end_date,
interval = '1d',
)['Adj Close']
prices = prices.dropna()
stock_name = ['Apple']
prices = prices.rename(stock_name[0], inplace = True)
# Log returns
returns = np.log(np.array(prices)[1:] / np.array(prices)[:-1])
# Forecasting horizon
H = 146
returns_series = pd.Series(returns)
realized_vol = returns_series.rolling(5).std()
realized_vol = pd.DataFrame(realized_vol)
realized_vol.reset_index(drop=True, inplace=True)
returns_svm = pd.DataFrame(returns ** 2)
X = pd.concat([realized_vol, returns_svm], axis=1, ignore_index=True)
X = X[4:].copy()
X = X.reset_index()
X.drop('index', axis=1, inplace=True)
realized_vol = realized_vol.dropna().reset_index()
realized_vol.drop('index', axis=1, inplace=True)
conditional_volatility = pd.DataFrame(index=prices.index[-H:], columns=['SVM Linear','SVM RBF','SVM Poly'])
para_grid = {'gamma': sp_rand(0.1, 1), 'C': sp_rand(0.1, 10), 'epsilon': sp_rand(0.1, 1)}
svr_lin = SVR(kernel='linear')
clf = RandomizedSearchCV(svr_lin, para_grid)
clf.fit(X[:-H], realized_vol.iloc[1:-(H-1)].values.reshape(-1,))
predict_svr_lin = clf.predict(X[-H:])
conditional_volatility['SVM Linear'] = predict_svr_lin
svr_rbf = SVR(kernel='rbf')
clf = RandomizedSearchCV(svr_rbf, para_grid)
clf.fit(X[:-H], realized_vol.iloc[1:-(H-1)].values.reshape(-1,))
predict_svr_rbf = clf.predict(X[-H:])
conditional_volatility['SVM RBF'] = predict_svr_rbf
svr_poly = SVR(kernel='poly')
clf = RandomizedSearchCV(svr_poly, para_grid)
clf.fit(X[:-H], realized_vol.iloc[1:-(H-1)].values.reshape(-1,))
predict_svr_poly = clf.predict(X[-H:])
conditional_volatility['SVM Poly'] = predict_svr_poly
print(conditional_volatility)
Output:
[*********************100%%**********************] 1 of 1 completed
SVM Linear SVM RBF SVM Poly
Date
2024-01-09 0.168156 0.168156 0.138204
2024-01-10 0.168156 0.168156 0.138204
2024-01-11 0.168156 0.168156 0.138204
2024-01-12 0.168156 0.168156 0.138204
2024-01-16 0.168156 0.168156 0.138204
... ... ... ...
2024-08-01 0.168156 0.168156 0.138204
2024-08-02 0.168156 0.168156 0.138204
2024-08-05 0.168156 0.168156 0.138204
2024-08-06 0.168156 0.168156 0.138204
2024-08-07 0.168156 0.168156 0.138204
[146 rows x 3 columns]
Why this might be happening and how to address it?
the issue is with your epsilon
values. According to the SVR manual:
epsilon - Epsilon in the epsilon-SVR model. It specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value. Must be non-negative.
Thus with high epsilon (epsilon much greater than volatility) you won't give any penalties for bad predictions.
Let's check how your data is distributed:
print(np.quantile(realized_vol, q = [0,0.01,0.05,0.5,0.95, 0.99,1]))
# [0.00075101 0.00375092 0.0058641 0.04780242 0.06916338 0.33556116]
Basically in the beginning of the dataset volatility is very low and it is much greater in later periods.
Anyway, if you allow epsilon to be close to 1
(as in your code) you don't give penalty to any of the volatility points, because they all far below 1.
The fix is to change parameter grid to smth like this:
para_grid = {'gamma': sp_rand(0.1, 1),
'C': sp_rand(0.1, 10),
'epsilon': sp_rand(0.001, 0.01)
}
or to multiply volatility by a high constant (after all volatility is a percentage):
para_grid = {'gamma': sp_rand(0.1, 1),
'C': sp_rand(0.1, 10),
'epsilon': sp_rand(0.1, 1)
}
realized_vol = realized_vol * 100