I have a dataframe that contains some information about users. There is a column for user, column for type, and column for count, like this:
name type count
robert x 123
robert y 456
robert z 5123
charlie x 442123
charlie y 0
charlie z 42
I'm trying to figure out which type has the highest count per name, so for this case, I would want to select this:
name type count
robert z 5123
charlie x 442123
I know I can do something like this to get the max count per name, but I'm not sure how I can include the "type" column, which is actually the most important
df.sort_values('count', ascending=False).drop_duplicates('name').sort_index()
What if you have two maxes for a name with different types:
print(df)
name type count
0 robert x 123
1 robert y 456
2 robert z 5123
3 robert a 5123
4 charlie x 442123
5 charlie y 0
6 charlie z 42
Use boolean indexing:
df[df['count'] == df.groupby('name')['count'].transform('max')]
Output:
name type count
2 robert z 5123
3 robert a 5123
4 charlie x 442123