pythontime-seriesvolatilityarch

Forecasting Volatility by EGARCH(1,1) using `arch` Package


Purpose

I want to predict daily volatility by EGARCH(1,1) model using arch package.
Interval of Prediction: 01-04-2015 to 12-06-2018 (mm-dd-yyyy format)

hence i should grab data (for example) from 2013 till 2015 to fit EGARCH(1,1) model on it, and then predict daily volatility for 01-04-2015 to 12-06-2018


Code

so i tried to write it like this:

# Packages That we need
from pandas_datareader import data as web
from arch import arch_model
import pandas as pd
#---------------------------------------

# grab Microsoft daily adjusted close price data from '01-03-2013' to '12-06-2018' and store it in DataFrame
df = pd.DataFrame(web.get_data_yahoo('MSFT' , start='01-03-2013' , end='12-06-2018')['Adj Close'])

#---------------------------------------

# calculate daily rate of return that is necessary for predicting daily Volatility by EGARCH
daily_rate_of_return_EGARCH = np.log(df.loc[ : '01-04-2015']/df.loc[ : '01-04-2015'].shift())
# drop NaN values
daily_rate_of_return_EGARCH = daily_rate_of_return_EGARCH.dropna()

#---------------------------------------

# Volatility Forecasting By EGARCH(1,1)
model_EGARCH = arch_model(daily_rate_of_return_EGARCH, vol='EGARCH' , p = 1 , o = 0 , q = 1)
fitted_EGARCH = model_EGARCH.fit(disp='off')

#---------------------------------------

# and finally, Forecasting step
# Note that as mentioned in `purpose` section, predict interval should be from '01-04-2015' to end of the data frame
horizon = len(df.loc['01-04-2015' : ])
volatility_FORECASTED = fitted_EGARCH.forecast(horizon = horizon , method='simulation')

Error

and then i got this error:

MemoryError                               Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_12900/1021856026.py in <module>
      1 horizon = len(df.loc['01-04-2015':])
----> 2 volatility_FORECASTED = fitted_EGARCH.forecast(horizon = horizon , method='simulation') 

MemoryError: Unable to allocate 3.71 GiB for an array with shape (503, 1000, 989) and data type float64

seems arch is going to save huge amount of data.

Expected Result

what i expect, is a simple pandas.Series that contains daily volatility predictions from '01-04-2015' until '12-06-2018'. precisely i mean smth like this:
(Note: date format --> mm-dd-yyyy)

    (DATE)     (VOLATILITY)
'01-04-2015'      .....
'01-05-2015'      .....
'01-06-2015'      .....
.                   .
.                   .
.                   .
'12-06-2018'      .....

How can i achieve this?


Solution

  • You only need to pass the reindex=False keyword and the memory requirement drops dramatically. You need a recent version of the arch package to use this feature which changes the output shape of the forecast to include only the forecast values, and so the alignment is different from the historical behavior.

    # Packages That we need
    from pandas_datareader import data as web
    from arch import arch_model
    import pandas as pd
    #---------------------------------------
    
    # grab Microsoft daily adjusted close price data from '01-03-2013' to '12-06-2018' and store it in DataFrame
    df = pd.DataFrame(web.get_data_yahoo('MSFT' , start='01-03-2013' , end='12-06-2018')['Adj Close'])
    
    #---------------------------------------
    
    # calculate daily rate of return that is necessary for predicting daily Volatility by EGARCH
    daily_rate_of_return_EGARCH = np.log(df.loc[ : '01-04-2015']/df.loc[ : '01-04-2015'].shift())
    # drop NaN values
    daily_rate_of_return_EGARCH = daily_rate_of_return_EGARCH.dropna()
    
    #---------------------------------------
    
    # Volatility Forecasting By EGARCH(1,1)
    model_EGARCH = arch_model(daily_rate_of_return_EGARCH, vol='EGARCH' , p = 1 , o = 0 , q = 1)
    fitted_EGARCH = model_EGARCH.fit(disp='off')
    
    #---------------------------------------
    
    # and finally, Forecasting step
    # Note that as mentioned in `purpose` section, predict interval should be from '01-04-2015' to end of the data frame
    horizon = len(df.loc['01-04-2015' : ])
    volatility_FORECASTED = fitted_EGARCH.forecast(horizon = horizon , method='simulation', reindex=False)