pythonpandascumulative-sumrunning-count

Conditional cumcount of values in second column


I want to fill numbers in column flag, based on the value in column KEY.

Here is the example, df1 is what I want from df0.

df0 = pd.DataFrame({'KEY':['0','0','0','0','1','1','1','2','2','2','2','2','3','3','3','3','3','3','4','5','6']})

df1 = pd.DataFrame({'KEY':['0','0','0','0','1','1','1','2','2','2','2','2','3','3','3','3','3','3','4','5','6'],
                    'flag':['0','0','1','1','2','2','3','4','4','5','5','6','7','7','8','8','9','9','10','11','12']})

Solution

  • You want to get the cumcount and add one. Then use %2 to differentiate between odd or even rows. Then, take the cumulative sum and subtract 1 to start counting from zero.

    You can use:

    df0['flag'] = ((df0.groupby('KEY').cumcount() + 1) % 2).cumsum() - 1
    df0
    Out[1]: 
       KEY  flag
    0    0      0
    1    0      0
    2    0      1
    3    0      1
    4    1      2
    5    1      2
    6    1      3
    7    2      4
    8    2      4
    9    2      5
    10   2      5
    11   2      6
    12   3      7
    13   3      7
    14   3      8
    15   3      8
    16   3      9
    17   3      9
    18   4     10
    19   5     11
    20   6     12