pythonpandasdataframedata-science-experience

How to take a sum (in denominator) for calculating group by weighted average in a dataframe?


I have a data frame that looks like this.

import pandas as pd
import numpy as np

data = [
  ['A',1,2,3,4],
  ['A',5,6,7,8],
  ['A',9,10,11,12],
  ['B',13,14,15,16],
  ['B',17,18,19,20],
  ['B',21,22,23,24],
  ['B',25,26,27,28],
  ['C',29,30,31,32],
  ['C',33,34,35,36],
  ['C',37,38,39,40],
  ['D',13,14,15,0],
  ['D',0,18,19,0],
  ['D',0,0,23,0],
  ['D',0,0,0,0],
  ['E',13,14,15,0],
  ['E',0,18,19,0],
  ['F',0,0,23,0],

]

df = pd.DataFrame(data, columns=['Name', 'num1', 'num2', 'num3', 'num4'])
df

Then I have the following code to calculate the group by weighted average.

weights = [10,20,30,40] 

df=df.groupby('Name').agg(lambda g: sum(g*weights[:len(g)])/sum(weights[:len(g)]))

The problem lies in sum(weights[:len(g)]) because all the groups do not have equal rows. As you can see above, group A has 3 rows, B has 4 rows, C has 3 rows, D has 4 rows, E has 2 rows and F has 1 row. Depending upon the rows, it needs to calculate the sum. Now, the above code returns me the weighted average by calculating

For Group A, the first column calculates the weighted average as (1 X 10+5 X 20+9 X 30)/60 but it should calculate the weighted average as (1 X20+5 X 30+9 X 40)/90

For Group E, the first column calculates the weighted average as (13 X 10+0 X 20)/30 but it should calculate the weighted average as (13 X 30+0 X 40)/70

Current Result

enter image description here

Expected result

enter image description here


Solution

  • i edit your code little bit

    n = len(weights)
    df=df.groupby('Name').agg(lambda g: sum(g*weights[n-len(g):])/sum(weights[n-len(g):]))
    

    output(df):

        num1    num2    num3    num4
    Name                
    A   5.9     6.9     7.9     8.9
    B   21.0    22.0    23.0    24.0
    C   33.9    34.9    35.9    36.9
    D   1.3     5.0     12.2    0.0
    E   5.6     16.3    17.3    0.0
    F   0.0     0.0     23.0    0.0