How to get rolling correlation in Python Polars? Or at least correlation per row like:
pl.corr(pl.col('col_1'), pl.col('col_2'))
I am aware of Pandas solution:
pd_df = result_df.to_pandas()
rol_corr_df = pd_df['col_1'].rolling(5).corr(pd_df['col_2'])
pl_df = pl_df.with_columns(correlation=pl.from_pandas(rol_corr_df))
polars has rolling
but it needs to be pointed to a time or integer column by which it'll group.
If you just want it to groupby rows then you can use with_row_index
to create an index.
Assume we start with
df=pl.DataFrame({'a':np.random.uniform(1,100,100), 'b':np.random.uniform(1,100,100), })
then we could do the following:
df \
.with_row_index('i') \
.rolling('i', period='10i') \
.agg(rolling_corr=pl.corr('a','b')) \
.drop('i')
shape: (100, 1)
┌──────────────┐
│ rolling_corr │
│ --- │
│ f64 │
╞══════════════╡
│ NaN │
│ 1.0 │
│ -0.419386 │
│ -0.322489 │
│ … │
│ -0.333332 │
│ -0.027533 │
│ 0.081232 │
│ 0.151985 │
└──────────────┘
Note in the rolling
, the period is set to a string 10i. If you had a datetime then there are more options, see here