pythonpandas

Extract row with maximum value in a group pandas dataframe


A similar question is asked here: Python : Getting the Row which has the max value in groups using groupby

However, I just need one record per group even if there are more than one record with maximum value in that group.

In the example below, I need one record for "s2". For me it doesn't matter which one.

>>> df = DataFrame({'Sp':['a','b','c','d','e','f'], 'Mt':['s1', 's1', 's2','s2','s2','s3'], 'Value':[1,2,3,4,5,6], 'count':[3,2,5,10,10,6]})
>>> df
   Mt Sp  Value  count
0  s1  a      1      3
1  s1  b      2      2
2  s2  c      3      5
3  s2  d      4     10
4  s2  e      5     10
5  s3  f      6      6
>>> idx = df.groupby(['Mt'])['count'].transform(max) == df['count']
>>> df[idx]
   Mt Sp  Value  count
0  s1  a      1      3
3  s2  d      4     10
4  s2  e      5     10
5  s3  f      6      6
>>> 

Solution

  • You can use first

    In [14]: df.groupby('Mt').first()
    Out[14]: 
       Sp  Value  count
    Mt                 
    s1  a      1      3
    s2  c      3      5
    s3  f      6      6
    

    Update

    Set as_index=False to achieve your goal

    In [28]: df.groupby('Mt', as_index=False).first()
    Out[28]: 
       Mt Sp  Value  count
    0  s1  a      1      3
    1  s2  c      3      5
    2  s3  f      6      6 
    

    Update Again

    Sorry for misunderstanding what you mean. You can sort it first if you want the one with max count in a group

    In [196]: df.sort('count', ascending=False).groupby('Mt', as_index=False).first()
    Out[196]: 
       Mt Sp  Value  count
    0  s1  a      1      3
    1  s2  e      5     10
    2  s3  f      6      6