pythonpandasmax

Pandas group by - maximum row for a subset


I have a dataframe with weekly product sales

product_id week_number sales
A1 1 1000
A1 2 2000
A1 3 3000
A2 1 8000
A2 2 4000
A2 3 2000

I want to add a column that identifies rows where the total sales were the highest for the given product:

product_id week_number sales product_max
A1 1 1000 FALSE
A1 2 2000 FALSE
A1 3 3000 TRUE
A2 1 8000 TRUE
A2 2 4000 FALSE
A2 3 2000 FALSE

Since A1 had its highest sales on week 3, that row is tagged as True. But week 3 wasn't the highest for A2. (And so for A2, week 1 is tagged as True.)

I know I could write a loop to do this and cycle through each of the product IDs one by one, but I am wondering if there is a way to do this with a different function - possibly Pandas Groupby?

Thank you!


Solution

  • Answer

    df['product_max'] = (
        df.groupby('product_id')['sales'].transform('max')
          .eq(df['sales'])
    )
    

    df

      product_id  week_number  sales  product_max
    0         A1            1   1000        False
    1         A1            2   2000        False
    2         A1            3   3000         True
    3         A2            1   8000         True
    4         A2            2   4000        False
    5         A2            3   2000        False
    

    Example Code

    import pandas as pd
    data = {'product_id': ['A1', 'A1', 'A1', 'A2', 'A2', 'A2'], 'week_number': [1, 2, 3, 1, 2, 3], 'sales': [1000, 2000, 3000, 8000, 4000, 2000]}
    df = pd.DataFrame(data)