pandasdataframetime-seriescorrelationseries

Computing Rolling autocorrelation using Pandas.rolling


I am attempting calculate the rolling auto-correlation for a Series object using Pandas (0.23.3)

Setting up the example:

dt_index = pd.date_range('2018-01-01','2018-02-01', freq = 'B')
data = np.random.rand(len(dt_index))
s = pd.Series(data, index = dt_index)

Creating a Rolling object with window size = 5:

r = s.rolling(5)

Getting:

Rolling [window=5,center=False,axis=0]

Now when I try to calculate the correlation (Pretty sure this is the wrong approach):

r.corr(other=r)

I get only NaNs

I tried another approach based on the documentation::

df = pd.DataFrame()
df['a'] = s
df['b'] = s.shift(-1)
df.rolling(window=5).corr()

Getting something like:

...
2018-03-01 a NaN NaN
           b NaN NaN

Really not sure where I'm going wrong with this. Any help would be immensely appreciated! The docs use float64 as well. Thinking it's because the correlation is very close to zero and so it's showing NaN? Somebody had raised a bug report here, but jreback solved the problem in a previous bug fix I think.

This is another relevant answer, but it's using pd.rolling_apply, which does not seem to be supported in Pandas version 0.23.3?


Solution

  • IIUC,

    >>> s.rolling(5).apply(lambda x: x.autocorr(), raw=False)
    
    2018-01-01         NaN
    2018-01-02         NaN
    2018-01-03         NaN
    2018-01-04         NaN
    2018-01-05   -0.502455
    2018-01-08   -0.072132
    2018-01-09   -0.216756
    2018-01-10   -0.090358
    2018-01-11   -0.928272
    2018-01-12   -0.754725
    2018-01-15   -0.822256
    2018-01-16   -0.941788
    2018-01-17   -0.765803
    2018-01-18   -0.680472
    2018-01-19   -0.902443
    2018-01-22   -0.796185
    2018-01-23   -0.691141
    2018-01-24   -0.427208
    2018-01-25    0.176668
    2018-01-26    0.016166
    2018-01-29   -0.876047
    2018-01-30   -0.905765
    2018-01-31   -0.859755
    2018-02-01   -0.795077