pandasgroup-by

Count and assign categories based on majority voting


I have a pandas dataframe in the below format:

 Class   Category
 XYZ     ABC
 XYZ     ABC
 XYZ     DEF
 XYZ1    ABC
 XYZ1    ABC
 XYZ1    ABC
 XYZ1    HLR
 XYZ2    ABC

For every unique class, if there are multiple observations for that class, I would like to assign the corresponding category to that class based on "majority voting".

For example, for "XYZ", Category should be "ABC".

For "XYZ1", category has to be "ABC" as well, because "HLR" appears only once.

If there are no discrepencies, then its straightforward (for "XYZ2", it would be "ABC").

Wondering is there a way to achieve this without storing the value counts in a table and then loop over it to groupby and assign categories based on majority voting.

Any leads would be appreciated.


Solution

  • Try Via mode:

    from statistics import mode
    df['New_Category'] = df.groupby('Class').transform(mode)
    
    OUTPUT:
      Class Category New_Categroy
    0   XYZ      ABC          ABC
    1   XYZ      ABC          ABC
    2   XYZ      DEF          ABC
    3  XYZ1      ABC          ABC
    4  XYZ1      ABC          ABC
    5  XYZ1      ABC          ABC
    6  XYZ1      HLR          ABC
    7  XYZ2      ABC          ABC