pythonpandasdataframegroup-bydrop-duplicates

Python Pandas groupby agg or transform on max value_counts to drop duplicate rows


I have this df, and want to drop duplicates based on the max value counts of 'rating' (its binary field). None of the drop_duplicates with combination of grouby, max, count isn't fecthing the desired output. Any suggestion highly appreciated.

df = pd.DataFrame({
     'brand': ['Yum Yum', 'Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie', 'Cup','Pack'],
     'rating': [1, 1, 0, 1, 0, 0, 1, 0]})

Desired Output :

brand rating
Yum Yum 1
Indomie 0
Cup 1
Pack 0

tried below but not helping with desired output.

df.groupby(["brand",'rating']).max().reset_index()
df.drop_duplicates(subset=['brand'], keep='last')

Solution

  • Since the rating is binary, you can do mean and compare to 0.5:

    df.groupby('brand', sort=False)['rating'].mean().ge(0.5).astype(int).reset_index()
    

    In general, you can use mode:

    df.groupby('brand', sort=False)['rating'].apply(lambda x: x.mode()[0])
    

    Output:

         brand  rating
    0  Yum Yum       1
    1  Indomie       0
    2      Cup       1
    3     Pack       0