Sample dataset:
id val
0 9 1
1 9 0
2 9 4
3 9 6
4 9 2
5 9 3
6 5 0
7 5 1
8 5 6
9 5 2
10 5 4
From the dataset, I want to generate a column sum
. For the first 3 rows: sum
=sum
+val
(group by id). From 4th row, each row contains the cumulative sum of the previous 3 rows of val
column(group by id). Loop through each row. When a new id appears, it should calculate from the beginning.
Desired Output:
id val sum
0 9 1 1
1 9 0 1
2 9 4 5
3 9 6 10
4 9 2 12
5 9 3 11
6 5 0 0
7 5 1 1
8 5 6 7
9 5 2 9
10 5 4 12
Code I tried:
df['sum']=df['val'].rolling(min_periods=1, window=3).groupby(df['id']).cumsum()
How do I figure out the custom cumulative sum function?
Are you sure the expected output is correct?
I would do:
df['sum'] = df.groupby('id')['val'].rolling(min_periods=1, window=3).sum().values
output:
id val sum
0 5 1 1.0
1 5 0 1.0
2 5 4 5.0
3 5 6 10.0
4 5 2 12.0
5 5 3 11.0
6 9 0 0.0
7 9 1 1.0
8 9 6 7.0
9 9 2 9.0
10 9 4 12.0