I have a time series object grouped
of the type <pandas.core.groupby.SeriesGroupBy object at 0x03F1A9F0>
. grouped.sum()
gives the desired result but I cannot get rolling_sum to work with the groupby
object. Is there any way to apply rolling functions to groupby
objects? For example:
x = range(0, 6)
id = ['a', 'a', 'a', 'b', 'b', 'b']
df = DataFrame(zip(id, x), columns = ['id', 'x'])
df.groupby('id').sum()
id x
a 3
b 12
However, I would like to have something like:
id x
0 a 0
1 a 1
2 a 3
3 b 3
4 b 7
5 b 12
To answer the question directly, the cumsum method would produced the desired series:
In [17]: df
Out[17]:
id x
0 a 0
1 a 1
2 a 2
3 b 3
4 b 4
5 b 5
In [18]: df.groupby('id').x.cumsum()
Out[18]:
0 0
1 1
2 3
3 3
4 7
5 12
Name: x, dtype: int64
More generally, any rolling function can be applied to each group as follows (using the new .rolling method as commented by @kekert). Note that the return type is a multi-indexed series, which is different from previous (deprecated) pd.rolling_* methods.
In [10]: df.groupby('id')['x'].rolling(2, min_periods=1).sum()
Out[10]:
id
a 0 0.00
1 1.00
2 3.00
b 3 3.00
4 7.00
5 9.00
Name: x, dtype: float64
To apply the per-group rolling function and receive result in original dataframe order, transform should be used instead:
In [16]: df.groupby('id')['x'].transform(lambda s: s.rolling(2, min_periods=1).sum())
Out[16]:
0 0
1 1
2 3
3 3
4 7
5 9
Name: x, dtype: int64
For reference, here's how the now deprecated pandas.rolling_mean behaved:
In [16]: df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)
Out[16]:
0 0.0
1 0.5
2 1.5
3 3.0
4 3.5
5 4.5