I need to get the exact 'number' of occurrence of a particular value in a column in another column in dataframe. To make it simpler, need the cumulative count of a repetitive value in a column
Consider the following example:
col_A | col_B |
---|---|
test1 | some1 |
test2 | some2 |
test2 | some3 |
test2 | some4 |
test3 | some5 |
test1 | some6 |
test3 | some7 |
I need a col_C that gets the exact occurence number of col_A values Consider the following:
col_A | col_B | col_C |
---|---|---|
test1 | some1 | 1 |
test2 | some2 | 1 |
test2 | some3 | 2 |
test2 | some4 | 3 |
test3 | some5 | 1 |
test1 | some6 | 2 |
test3 | some7 | 2 |
In row1 - 'test1' appears for the 1st time, so we got '1' in col_C; in row2 - 'test2' appears for the 1st time, so we got '1' in col_C; in row3 - 'test2' appears for the 2nd time, so we got '2' in col_C and so on
Did use value_counts, but am only getting the count of col_A values in col_C. I need the exact occurrence.
Thanks for the help!
What you're looking for is cumcount() from groupby. It counts the amount of the same values in each column starting from 0.
This code should work.
s = df.groupby('col_A').cumcount()
df['col_C'] = s+1