I have this df, and want to drop duplicates based on the max value counts of 'rating' (its binary field). None of the drop_duplicates with combination of grouby, max, count isn't fecthing the desired output. Any suggestion highly appreciated.
df = pd.DataFrame({
'brand': ['Yum Yum', 'Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie', 'Cup','Pack'],
'rating': [1, 1, 0, 1, 0, 0, 1, 0]})
Desired Output :
brand | rating |
---|---|
Yum Yum | 1 |
Indomie | 0 |
Cup | 1 |
Pack | 0 |
tried below but not helping with desired output.
df.groupby(["brand",'rating']).max().reset_index()
df.drop_duplicates(subset=['brand'], keep='last')
Since the rating is binary, you can do mean
and compare to 0.5:
df.groupby('brand', sort=False)['rating'].mean().ge(0.5).astype(int).reset_index()
In general, you can use mode
:
df.groupby('brand', sort=False)['rating'].apply(lambda x: x.mode()[0])
Output:
brand rating
0 Yum Yum 1
1 Indomie 0
2 Cup 1
3 Pack 0