pythonpandasdataframenumpy

Groupby by sum of revenue and the corresponding highest contributing month - Pandas


I have a bill details data set and I want to do a groupby of the products based on the sum of their Total value, additionally i want a column which indicates the month which has produced the most revenue for the corresponding product

Data set:

Bill_Id  Month  Product_Id  Net_Value
 1        1       20          100
 2        1       20          100
 3        2       20          100
 4        1       30          200
 5        2       30          200
 6        2       30          200

Desired_Result

Product_Id  Total_revenue  Top_Month
   20          300            1
   30          600            2

This just a sample dataset I have the transaction data of the entire year


Solution

  • Pivot the dataframe with aggfunc=sum, then use sum and idxmax along columns axis to find total revenue and month which has the highest contribution to total revenue, finally concat the individual components along column axis to get the result

    s = df.pivot_table('Net_Value', 'Product_Id', 'Month', aggfunc='sum')
    pd.concat([s.sum(1), s.idxmax(1)], axis=1, keys=['Total_revenue', 'Top_Month'])
    

                Total_revenue  Top_Month
    Product_Id                          
    20                    300          1
    30                    600          2