pythonpandasdataframeexploratory-data-analysis

Pandas Groupby and Compare rows to find maximum value


I've a dataframe

a b c
one 6 11
one 7 12
two 8 23
two 9 14
three 10 15
three 20 25

I want to apply groupby at column a and then find the highest value in column c, so that, the highest value gets flagged, i.e.

a b c
one 6 11
one 7 12

Compare value 11&12, then

a b c
two 8 23
two 9 14

Compare value 23&14, then

a b c
three 10 15
three 20 25

Finally resulting in:

a b c flag
one 6 11 no
one 7 12 yes
two 8 23 yes
two 9 14 no
three 10 15 no
three 20 25 yes

I/P DF:

df = pd.DataFrame({
    'a':["one","one","two","two","three","three"]
    , 'b':[6,7,8,9,10,20]
    , 'c':[11,12,23,14,15,25]
    # , 'flag': ['no', 'yes', 'yes', 'no', 'no', 'yes']
})
df

Solution

  • You can use groupby.transform to get the max value per group, and numpy.where to map the True/False to 'yes'/'no':

    df['flag'] = np.where(df.groupby('a')['c'].transform('max').eq(df['c']), 'yes', 'no')
    

    output:

           a   b   c flag
    0    one   6  11   no
    1    one   7  12  yes
    2    two   8  23  yes
    3    two   9  14   no
    4  three  10  15   no
    5  three  20  25  yes
    

    Intermediates:

    df.groupby('a')['c'].transform('max')
    
    0    12
    1    12
    2    23
    3    23
    4    25
    5    25
    Name: c, dtype: int64
    
    df.groupby('a')['c'].transform('max').eq(df['c'])
    0    False
    1     True
    2     True
    3    False
    4    False
    5     True
    Name: c, dtype: bool