pythonpandasgroup-byrolling-computationrolling-sum

How to use rolling functions for GroupBy objects


I have a time series object grouped of the type <pandas.core.groupby.SeriesGroupBy object at 0x03F1A9F0>. grouped.sum() gives the desired result but I cannot get rolling_sum to work with the groupby object. Is there any way to apply rolling functions to groupby objects? For example:

x = range(0, 6)
id = ['a', 'a', 'a', 'b', 'b', 'b']
df = DataFrame(zip(id, x), columns = ['id', 'x'])
df.groupby('id').sum()
id    x
a    3
b   12

However, I would like to have something like:

  id  x
0  a  0
1  a  1
2  a  3
3  b  3
4  b  7
5  b  12

Solution

  • cumulative sum

    To answer the question directly, the cumsum method would produced the desired series:

    In [17]: df
    Out[17]:
      id  x
    0  a  0
    1  a  1
    2  a  2
    3  b  3
    4  b  4
    5  b  5
    
    In [18]: df.groupby('id').x.cumsum()
    Out[18]:
    0     0
    1     1
    2     3
    3     3
    4     7
    5    12
    Name: x, dtype: int64
    

    pandas rolling functions per group

    More generally, any rolling function can be applied to each group as follows (using the new .rolling method as commented by @kekert). Note that the return type is a multi-indexed series, which is different from previous (deprecated) pd.rolling_* methods.

    In [10]: df.groupby('id')['x'].rolling(2, min_periods=1).sum()
    Out[10]:
    id
    a   0   0.00
        1   1.00
        2   3.00
    b   3   3.00
        4   7.00
        5   9.00
    Name: x, dtype: float64
    

    To apply the per-group rolling function and receive result in original dataframe order, transform should be used instead:

    In [16]: df.groupby('id')['x'].transform(lambda s: s.rolling(2, min_periods=1).sum())
    Out[16]:
    0    0
    1    1
    2    3
    3    3
    4    7
    5    9
    Name: x, dtype: int64
    

    deprecated approach

    For reference, here's how the now deprecated pandas.rolling_mean behaved:

    In [16]: df.groupby('id')['x'].apply(pd.rolling_mean, 2, min_periods=1)
    Out[16]: 
    0    0.0
    1    0.5
    2    1.5
    3    3.0
    4    3.5
    5    4.5