pythonpandas

Keep maximum value per group including repetitions


Let's say I have a dataframe like this:

    a   b   c
0   x1  y1  9
1   x1  y2  9
2   x1  y3  4
3   x2  y4  2
4   x2  y5  10
5   x2  y6  5
6   x3  y7  6
7   x3  y8  4
8   x3  y9  8
9   x4  y10 11
10  x4  y11 11
11  x4  y12 11

I first want to do a grouped sort of column c (grouped by column a), and then I want to retain all the rows in each group that have the highest values of column c. So the output will look like:

    a   b   c
0   x1  y1  9
1   x1  y2  9
4   x2  y5  10
8   x3  y9  8
9   x4  y10 11
10  x4  y11 11
11  x4  y12 11

Is there a clean way of doing so without using any loops, etc.?


Solution

  • You could groupby column a and find the max per group, and merge back the resulting dataframe to keep the matching rows:

    df.merge(df.groupby('a').c.max())
    
        a    b   c
    0  x1   y1   9
    1  x1   y2   9
    2  x2   y5  10
    3  x3   y9   8
    4  x4  y10  11
    5  x4  y11  11
    6  x4  y12  11