pythonpandasdataframegroup-byrunning-count

Add a sequential counter column on groups to a pandas dataframe


I feel like there is a better way than this:

import pandas as pd
df = pd.DataFrame(
    columns="   index    c1    c2    v1 ".split(),
    data= [
            [       0,  "A",  "X",    3, ],
            [       1,  "A",  "X",    5, ],
            [       2,  "A",  "Y",    7, ],
            [       3,  "A",  "Y",    1, ],
            [       4,  "B",  "X",    3, ],
            [       5,  "B",  "X",    1, ],
            [       6,  "B",  "X",    3, ],
            [       7,  "B",  "Y",    1, ],
            [       8,  "C",  "X",    7, ],
            [       9,  "C",  "Y",    4, ],
            [      10,  "C",  "Y",    1, ],
            [      11,  "C",  "Y",    6, ],]).set_index("index", drop=True)
def callback(x):
    x['seq'] = range(1, x.shape[0] + 1)
    return x
df = df.groupby(['c1', 'c2']).apply(callback)
print df

To achieve this:

   c1 c2  v1  seq
0   A  X   3    1
1   A  X   5    2
2   A  Y   7    1
3   A  Y   1    2
4   B  X   3    1
5   B  X   1    2
6   B  X   3    3
7   B  Y   1    1
8   C  X   7    1
9   C  Y   4    1
10  C  Y   1    2
11  C  Y   6    3

Is there a way to do it that avoids the callback?


Solution

  • use cumcount(), see docs here

    In [4]: df.groupby(['c1', 'c2']).cumcount()
    Out[4]: 
    0     0
    1     1
    2     0
    3     1
    4     0
    5     1
    6     2
    7     0
    8     0
    9     0
    10    1
    11    2
    dtype: int64
    

    If you want orderings starting at 1

    In [5]: df.groupby(['c1', 'c2']).cumcount()+1
    Out[5]: 
    0     1
    1     2
    2     1
    3     2
    4     1
    5     2
    6     3
    7     1
    8     1
    9     1
    10    2
    11    3
    dtype: int64