I have a bill details data set and I want to do a groupby of the products based on the sum of their Total value, additionally i want a column which indicates the month which has produced the most revenue for the corresponding product
Data set:
Bill_Id Month Product_Id Net_Value
1 1 20 100
2 1 20 100
3 2 20 100
4 1 30 200
5 2 30 200
6 2 30 200
Desired_Result
Product_Id Total_revenue Top_Month
20 300 1
30 600 2
This just a sample dataset I have the transaction data of the entire year
Pivot the dataframe with aggfunc=sum
, then use sum
and idxmax
along columns axis to find total revenue and month which has the highest contribution to total revenue, finally concat
the individual components along column axis to get the result
s = df.pivot_table('Net_Value', 'Product_Id', 'Month', aggfunc='sum')
pd.concat([s.sum(1), s.idxmax(1)], axis=1, keys=['Total_revenue', 'Top_Month'])
Total_revenue Top_Month
Product_Id
20 300 1
30 600 2