I have a dataframe with weekly product sales
product_id | week_number | sales |
---|---|---|
A1 | 1 | 1000 |
A1 | 2 | 2000 |
A1 | 3 | 3000 |
A2 | 1 | 8000 |
A2 | 2 | 4000 |
A2 | 3 | 2000 |
I want to add a column that identifies rows where the total sales were the highest for the given product:
product_id | week_number | sales | product_max |
---|---|---|---|
A1 | 1 | 1000 | FALSE |
A1 | 2 | 2000 | FALSE |
A1 | 3 | 3000 | TRUE |
A2 | 1 | 8000 | TRUE |
A2 | 2 | 4000 | FALSE |
A2 | 3 | 2000 | FALSE |
Since A1 had its highest sales on week 3, that row is tagged as True. But week 3 wasn't the highest for A2. (And so for A2, week 1 is tagged as True.)
I know I could write a loop to do this and cycle through each of the product IDs one by one, but I am wondering if there is a way to do this with a different function - possibly Pandas Groupby?
Thank you!
Answer
df['product_max'] = (
df.groupby('product_id')['sales'].transform('max')
.eq(df['sales'])
)
df
product_id week_number sales product_max
0 A1 1 1000 False
1 A1 2 2000 False
2 A1 3 3000 True
3 A2 1 8000 True
4 A2 2 4000 False
5 A2 3 2000 False
Example Code
import pandas as pd
data = {'product_id': ['A1', 'A1', 'A1', 'A2', 'A2', 'A2'], 'week_number': [1, 2, 3, 1, 2, 3], 'sales': [1000, 2000, 3000, 8000, 4000, 2000]}
df = pd.DataFrame(data)