pythonscikit-learntime-seriessvm

Why does GARCH-SVM output identical predictions for conditional volatility?


I'm using the SVR-GARCH model to predict conditional volatility, as described in the book Machine Learning for Financial Risk Management with Python: Algorithms for Modeling Risk by Abdullah Karasan.

I’ve encountered an issue where my code sometimes produces the same repeated value for the conditional volatility across the entire forecasting horizon. I understand that the initial parameter values are randomized, but I am confused about why, in most cases, the prediction results in a constant value throughout the forecast period.

import yfinance as yf
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
from sklearn.svm import SVR
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform as sp_rand
from sklearn.preprocessing import StandardScaler

# Select assets
stock_name = ['AAPL']
end_date = datetime.today()
start_date = end_date - timedelta(days = 365 * 25)

# Download the prices
prices = yf.download(
    stock_name,
    start = start_date,
    end = end_date,
    interval = '1d',
)['Adj Close']
prices = prices.dropna()
stock_name = ['Apple']
prices = prices.rename(stock_name[0], inplace = True)

# Log returns
returns = np.log(np.array(prices)[1:] / np.array(prices)[:-1])

# Forecasting horizon
H = 146

returns_series = pd.Series(returns)
realized_vol = returns_series.rolling(5).std()
realized_vol = pd.DataFrame(realized_vol)
realized_vol.reset_index(drop=True, inplace=True)

returns_svm = pd.DataFrame(returns ** 2)

X = pd.concat([realized_vol, returns_svm], axis=1, ignore_index=True)
X = X[4:].copy()
X = X.reset_index()
X.drop('index', axis=1, inplace=True)

realized_vol = realized_vol.dropna().reset_index()
realized_vol.drop('index', axis=1, inplace=True)

conditional_volatility = pd.DataFrame(index=prices.index[-H:], columns=['SVM Linear','SVM RBF','SVM Poly'])

para_grid = {'gamma': sp_rand(0.1, 1), 'C': sp_rand(0.1, 10), 'epsilon': sp_rand(0.1, 1)}

svr_lin = SVR(kernel='linear')
clf = RandomizedSearchCV(svr_lin, para_grid)
clf.fit(X[:-H], realized_vol.iloc[1:-(H-1)].values.reshape(-1,))
predict_svr_lin = clf.predict(X[-H:])
conditional_volatility['SVM Linear'] = predict_svr_lin

svr_rbf = SVR(kernel='rbf')
clf = RandomizedSearchCV(svr_rbf, para_grid)
clf.fit(X[:-H], realized_vol.iloc[1:-(H-1)].values.reshape(-1,))
predict_svr_rbf = clf.predict(X[-H:])
conditional_volatility['SVM RBF'] = predict_svr_rbf

svr_poly = SVR(kernel='poly')
clf = RandomizedSearchCV(svr_poly, para_grid)
clf.fit(X[:-H], realized_vol.iloc[1:-(H-1)].values.reshape(-1,))
predict_svr_poly = clf.predict(X[-H:])
conditional_volatility['SVM Poly'] = predict_svr_poly

print(conditional_volatility)

Output:

[*********************100%%**********************]  1 of 1 completed
            SVM Linear   SVM RBF  SVM Poly
Date                                      
2024-01-09    0.168156  0.168156  0.138204
2024-01-10    0.168156  0.168156  0.138204
2024-01-11    0.168156  0.168156  0.138204
2024-01-12    0.168156  0.168156  0.138204
2024-01-16    0.168156  0.168156  0.138204
...                ...       ...       ...
2024-08-01    0.168156  0.168156  0.138204
2024-08-02    0.168156  0.168156  0.138204
2024-08-05    0.168156  0.168156  0.138204
2024-08-06    0.168156  0.168156  0.138204
2024-08-07    0.168156  0.168156  0.138204

[146 rows x 3 columns]

Why this might be happening and how to address it?


Solution

  • the issue is with your epsilon values. According to the SVR manual:

    epsilon - Epsilon in the epsilon-SVR model. It specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value. Must be non-negative.

    Thus with high epsilon (epsilon much greater than volatility) you won't give any penalties for bad predictions.

    Let's check how your data is distributed:

    print(np.quantile(realized_vol, q = [0,0.01,0.05,0.5,0.95, 0.99,1]))
    # [0.00075101 0.00375092 0.0058641  0.04780242 0.06916338 0.33556116]
    

    Basically in the beginning of the dataset volatility is very low and it is much greater in later periods.

    Anyway, if you allow epsilon to be close to 1 (as in your code) you don't give penalty to any of the volatility points, because they all far below 1.

    The fix is to change parameter grid to smth like this:

    para_grid = {'gamma': sp_rand(0.1, 1), 
                 'C': sp_rand(0.1, 10), 
                 'epsilon': sp_rand(0.001, 0.01)
                }
    

    or to multiply volatility by a high constant (after all volatility is a percentage):

    para_grid = {'gamma': sp_rand(0.1, 1), 
                 'C': sp_rand(0.1, 10), 
                 'epsilon': sp_rand(0.1, 1)
                 }
    
    realized_vol = realized_vol * 100