pythonpandas

Select rows with highest value from groupby


I have a dataframe that contains some information about users. There is a column for user, column for type, and column for count, like this:

name         type     count
robert       x        123
robert       y        456
robert       z        5123
charlie      x        442123
charlie      y        0 
charlie      z        42

I'm trying to figure out which type has the highest count per name, so for this case, I would want to select this:

name         type    count
robert       z       5123
charlie      x       442123

I know I can do something like this to get the max count per name, but I'm not sure how I can include the "type" column, which is actually the most important

df.sort_values('count', ascending=False).drop_duplicates('name').sort_index()

Solution

  • What if you have two maxes for a name with different types:

    print(df)
    
          name type   count
    0   robert    x     123
    1   robert    y     456
    2   robert    z    5123
    3   robert    a    5123
    4  charlie    x  442123
    5  charlie    y       0
    6  charlie    z      42
    

    Use boolean indexing:

    df[df['count'] == df.groupby('name')['count'].transform('max')]
    

    Output:

          name type   count
    2   robert    z    5123
    3   robert    a    5123
    4  charlie    x  442123