pandas

Pandas- Create column based on sum of previous row values


Sample dataset:

   id val
0   9   1
1   9   0
2   9   4
3   9   6
4   9   2
5   9   3
6   5   0
7   5   1
8   5   6
9   5   2
10  5   4

From the dataset, I want to generate a column sum. For the first 3 rows: sum=sum+val(group by id). From 4th row, each row contains the cumulative sum of the previous 3 rows of val column(group by id). Loop through each row. When a new id appears, it should calculate from the beginning.

Desired Output:

    id  val sum
0    9    1   1
1    9    0   1
2    9    4   5
3    9    6  10
4    9    2  12
5    9    3  11
6    5    0   0
7    5    1   1
8    5    6   7
9    5    2   9
10   5    4  12

Code I tried:

df['sum']=df['val'].rolling(min_periods=1, window=3).groupby(df['id']).cumsum()

How do I figure out the custom cumulative sum function?


Solution

  • Are you sure the expected output is correct?

    I would do:

    df['sum'] = df.groupby('id')['val'].rolling(min_periods=1, window=3).sum().values
    

    output:

        id  val   sum
    0    5    1   1.0
    1    5    0   1.0
    2    5    4   5.0
    3    5    6  10.0
    4    5    2  12.0
    5    5    3  11.0
    6    9    0   0.0
    7    9    1   1.0
    8    9    6   7.0
    9    9    2   9.0
    10   9    4  12.0