pythonpandasdataframecumulative-frequency

Calculate cumulative count of a pandas dataframe column


I have created this pandas dataframe:

import numpy as np
import pandas as pd

ds = {"col1":[1,2,3,2,2,2,3,4,1,0,0,0,0,0,1,2,3,5]}

df = pd.DataFrame(data=ds)

which looks like this:

print(df)

    col1
0      1
1      2
2      3
3      2
4      2
5      2
6      3
7      4
8      1
9      0
10     0
11     0
12     0
13     0
14     1
15     2
16     3
17     5

I need to create a new column (col2) which contains the cumulative count of the values in col1. So, the resulting dataframe would look like this:

enter image description here

Does anybody know how to do it, please?


Solution

  • There is precisely a grouby.cumcount function:

    df['col2'] = df.groupby('col1').cumcount().add(1)
    

    Output:

        col1  col2
    0      1     1
    1      2     1
    2      3     1
    3      2     2
    4      2     3
    5      2     4
    6      3     2
    7      4     1
    8      1     2
    9      0     1
    10     0     2
    11     0     3
    12     0     4
    13     0     5
    14     1     3
    15     2     5
    16     3     3
    17     5     1