Let's say I have a dataframe like this:
a b c
0 x1 y1 9
1 x1 y2 9
2 x1 y3 4
3 x2 y4 2
4 x2 y5 10
5 x2 y6 5
6 x3 y7 6
7 x3 y8 4
8 x3 y9 8
9 x4 y10 11
10 x4 y11 11
11 x4 y12 11
I first want to do a grouped sort of column c
(grouped by column a
), and then I want to retain all the rows in each group that have the highest values of column c
. So the output will look like:
a b c
0 x1 y1 9
1 x1 y2 9
4 x2 y5 10
8 x3 y9 8
9 x4 y10 11
10 x4 y11 11
11 x4 y12 11
Is there a clean way of doing so without using any loops, etc.?
You could groupby
column a
and find the max
per group, and merge
back the resulting dataframe to keep the matching rows:
df.merge(df.groupby('a').c.max())
a b c
0 x1 y1 9
1 x1 y2 9
2 x2 y5 10
3 x3 y9 8
4 x4 y10 11
5 x4 y11 11
6 x4 y12 11