I have a data frame that looks like this.
import pandas as pd
import numpy as np
data = [
['A',1,2,3,4],
['A',5,6,7,8],
['A',9,10,11,12],
['B',13,14,15,16],
['B',17,18,19,20],
['B',21,22,23,24],
['B',25,26,27,28],
['C',29,30,31,32],
['C',33,34,35,36],
['C',37,38,39,40],
['D',13,14,15,0],
['D',0,18,19,0],
['D',0,0,23,0],
['D',0,0,0,0],
['E',13,14,15,0],
['E',0,18,19,0],
['F',0,0,23,0],
]
df = pd.DataFrame(data, columns=['Name', 'num1', 'num2', 'num3', 'num4'])
df
Then I have the following code to calculate the group by weighted average.
weights = [10,20,30,40]
df=df.groupby('Name').agg(lambda g: sum(g*weights[:len(g)])/sum(weights[:len(g)]))
The problem lies in sum(weights[:len(g)])
because all the groups do not have equal rows. As you can see above, group A has 3 rows, B has 4 rows, C has 3 rows, D has 4 rows, E has 2 rows and F has 1 row
.
Depending upon the rows, it needs to calculate the sum.
Now, the above code returns me the weighted average by calculating
For Group A, the first column calculates the weighted average as (1 X 10+5 X 20+9 X 30)/60 but it should calculate the weighted average as (1 X20+5 X 30+9 X 40)/90
For Group E, the first column calculates the weighted average as (13 X 10+0 X 20)/30 but it should calculate the weighted average as (13 X 30+0 X 40)/70
Current Result
Expected result
i edit your code little bit
n = len(weights)
df=df.groupby('Name').agg(lambda g: sum(g*weights[n-len(g):])/sum(weights[n-len(g):]))
output(df
):
num1 num2 num3 num4
Name
A 5.9 6.9 7.9 8.9
B 21.0 22.0 23.0 24.0
C 33.9 34.9 35.9 36.9
D 1.3 5.0 12.2 0.0
E 5.6 16.3 17.3 0.0
F 0.0 0.0 23.0 0.0