If I do a groupby() followed by a rolling() calculation with a multi-level index, one of the levels in the index is repeated - most odd. I am using Pandas 0.18.1
import pandas as pd
df = pd.DataFrame(data=[[1, 1, 10, 20], [1, 2, 30, 40], [1, 3, 50, 60],
[2, 1, 11, 21], [2, 2, 31, 41], [2, 3, 51, 61]],
columns=['id', 'date', 'd1', 'd2'])
df.set_index(['id', 'date'], inplace=True)
df = df.groupby(level='id').rolling(window=2)['d1'].sum()
print(df)
print(df.index)
The output is as follows
id id date
1 1 1 NaN
2 40.0
3 80.0
2 2 1 NaN
2 42.0
3 82.0
Name: d1, dtype: float64
MultiIndex(levels=[[1, 2], [1, 2], [1, 2, 3]],
labels=[[0, 0, 0, 1, 1, 1], [0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
names=[u'id', u'id', u'date'])
What is odd is that the id column now shows up twice in the multi-index. Moving the ['d1'] column selection around doesn't make any difference.
Any help would be much appreciated.
Thanks Paul
It is bug.
But version with apply
works nice, this alternative is here (only d1
was moved to apply
):
df = df.groupby(level='id').d1.apply(lambda x: x.rolling(window=2).sum())
print(df)
id date
1 1 NaN
2 40.0
3 80.0
2 1 NaN
2 42.0
3 82.0
Name: d1, dtype: float64