I'm creating time-series econometric regression models. The data is stored in a Pandas data frame.
How can I do lagged time-series econometric analysis using Python? I have used Eviews in the past (which is a standalone econometric program i.e. not a Python package). To estimate an OLS equation using Eviews you can write something like:
equation eq1.ls log(usales) c log(usales(-1)) log(price(-1)) tv_spend radio_spend
Note the lagged dependent and lagged price terms. It's these lagged variables which seem to be difficult to handle using Python e.g. using scikit or statmodels (unless I've missed something).
Once I've created a model I'd like to perform tests and use the model to forecast.
I'm not interested in doing ARIMA, Exponential Smoothing, or Holt Winters time-series projections - I'm mainly interested in time-series OLS.
pandas allows you to shift your data without moving the index. The shift
function on a dataframe df
allows creating leads and lags.
df.shift(-1)
will create a 1 index lead into the future and
df.shift(1)
will create a 1 period lag.
So if you have a daily time series, you could use df.shift(1) to create a 1 day lag in your values of price such as
df['lagprice'] = df['price'].shift(-1)
After that, if you want to do OLS you can look at the scipy module here :
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.linregress.html