pythonmachine-learningtime-seriesstatsmodelsautoregressive-models

Partial fit or incremental learning for autoregressive model


I have two time series representing two independent periods of data observation. I would like to fit an autoregressive model to this data. In other words, I would like to perform two partial fits, or two sessions of incremental learning.

This is a simplified description of a not-unusual scenario which could also apply to batch fitting on large datasets.

How do I do this (in statsmodels or otherwise)? Bonus points if the solution can generalise to other time-series models like ARIMA.

In pseudocode, something like:

import statsmodels.api as sm
from statsmodels.tsa.ar_model import AutoReg

data = sm.datasets.sunspots.load_pandas().data['SUNACTIVITY']
data_1 = data[:len(data)//3]
data_2 = data[len(data)-len(data)//3:]

# This is the standard single fit usage
res = AutoReg(data_1, lags=12).fit()
res.aic

# This is more like what I would like to do
model = AutoReg(lags=12)
model.partial_fit(data_1)
model.partial_fit(data_2)
model.results.aic

Solution

  • Statsmodels does not directly have this functionality. As Kevin S mentioned though, pmdarima does have a wrapper that provides this functionality. Specifically the update method. Per their documentation: "Update the model fit with additional observed endog/exog values.".

    See example below around your particular code:

    from pmdarima.arima import ARIMA
    import statsmodels.api as sm
    
    data = sm.datasets.sunspots.load_pandas().data['SUNACTIVITY']
    data_1 = data[:len(data)//3]
    data_2 = data[len(data)-len(data)//3:]
    
    # This is the standard single fit usage
    model = ARIMA(order=(12,0,0))
    model.fit(data_1)
    
    # update the model parameters with the new parameters
    model.update(data_2)