pandascumulative-frequency

create a new column based on cumulative occurrences of a specific value in another column pandas


I want to count the number of occurrences of one specific value (string) in one column and write it down in another column cumulatively.

For example, counting the cumulative number of Y values here:

col_1  new_col
Y        1
Y        2
N        2
Y        3
N        3

I wrote this code but it gives me the final number instead of cumulative frequencies.

df['new_col'] = 0
df['new_col'] = df.loc[df.col_1 == 'Y'].count()

Solution

  • To count both values cumulatively you can use:

    df['new_col'] = (df
                     .groupby('col_1')
                     .cumcount().add(1)
                     .cummax()
                     )
    

    If you want to focus on 'Y':

    df['new_col'] = (df
                     .groupby('col_1')
                     .cumcount().add(1)
                     .where(df['col_1'].eq('Y'))
                     .ffill()
                     .fillna(0, downcast='infer')
                     )
    

    Output:

      col_1  new_col
    0     Y        1
    1     Y        2
    2     N        2
    3     Y        3
    4     N        3