[SOLVED] Pandas group by - maximum row for a subset

Pandas group by - maximum row for a subset

I have a dataframe with weekly product sales

product_id	week_number	sales
A1	1	1000
A1	2	2000
A1	3	3000
A2	1	8000
A2	2	4000
A2	3	2000

I want to add a column that identifies rows where the total sales were the highest for the given product:

product_id	week_number	sales	product_max
A1	1	1000	FALSE
A1	2	2000	FALSE
A1	3	3000	TRUE
A2	1	8000	TRUE
A2	2	4000	FALSE
A2	3	2000	FALSE

Since A1 had its highest sales on week 3, that row is tagged as True. But week 3 wasn't the highest for A2. (And so for A2, week 1 is tagged as True.)

I know I could write a loop to do this and cycle through each of the product IDs one by one, but I am wondering if there is a way to do this with a different function - possibly Pandas Groupby?

Thank you!

Solution

Answer

df['product_max'] = (
    df.groupby('product_id')['sales'].transform('max')
      .eq(df['sales'])
)

  product_id  week_number  sales  product_max
0         A1            1   1000        False
1         A1            2   2000        False
2         A1            3   3000         True
3         A2            1   8000         True
4         A2            2   4000        False
5         A2            3   2000        False

Example Code

import pandas as pd
data = {'product_id': ['A1', 'A1', 'A1', 'A2', 'A2', 'A2'], 'week_number': [1, 2, 3, 1, 2, 3], 'sales': [1000, 2000, 3000, 8000, 4000, 2000]}
df = pd.DataFrame(data)