[SOLVED] How to Use Lagged Time-Series Variables in a Python Pandas Regression Model?

How to Use Lagged Time-Series Variables in a Python Pandas Regression Model?

I'm creating time-series econometric regression models. The data is stored in a Pandas data frame.

How can I do lagged time-series econometric analysis using Python? I have used Eviews in the past (which is a standalone econometric program i.e. not a Python package). To estimate an OLS equation using Eviews you can write something like:

equation eq1.ls log(usales) c log(usales(-1)) log(price(-1)) tv_spend radio_spend

Note the lagged dependent and lagged price terms. It's these lagged variables which seem to be difficult to handle using Python e.g. using scikit or statmodels (unless I've missed something).

Once I've created a model I'd like to perform tests and use the model to forecast.

I'm not interested in doing ARIMA, Exponential Smoothing, or Holt Winters time-series projections - I'm mainly interested in time-series OLS.

Solution

pandas allows you to shift your data without moving the index. The shift function on a dataframe df allows creating leads and lags.

df.shift(-1)

will create a 1 index lead into the future and

df.shift(1)

will create a 1 period lag.

So if you have a daily time series, you could use df.shift(1) to create a 1 day lag in your values of price such as

df['lagprice'] = df['price'].shift(-1)

After that, if you want to do OLS you can look at the scipy module here :

http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.linregress.html