I'm trying to create a new column that gives a rolling sum of values in the Values
column. The rolling sum includes 4 rows i.e. the current row and the next three rows. I want to do this for each type in the 'Type' column.
However, if there are fewer than 4 rows before the next type starts, I want the rolling sum to use only the remaining rows. For example, if there are 2 rows after the current row for the current type, a total of 3 rows is used for the rolling sum. See the table below showing what I'm currently getting and what I expect.
Index | Type | Value | Current Rolling Sum | Expected Rolling Sum |
---|---|---|---|---|
1 | left | 5 | 22 | 22 |
2 | left | 9 | 34 | 34 |
3 | left | 0 | NaN | 25 |
4 | left | 8 | NaN | 25 |
5 | left | 17 | NaN | 17 |
6 | straight | 7 | 61 | 61 |
7 | straight | 4 | 77 | 77 |
8 | straight | 0 | 86 | 86 |
9 | straight | 50 | 97 | 97 |
10 | straight | 23 | NaN | 47 |
11 | straight | 13 | NaN | 24 |
12 | straight | 11 | NaN | 11 |
The following line of code is what I'm currently using to get the rolling sum.
rolling_sum = df.groupby('Type', sort=False)['Value'].rolling(4, min_periods = 3).sum().shift(-3).reset_index()
rolling_sum = rolling_sum.rename(columns={'Value': 'Rolling Sum'})
extracted_col = rolling_sum['Rolling Sum']
df = df.join(extracted_col)
I would really appreciate your help.
You can try running the rolling sum on the reversed values for each group and then reverse back afterward, using a min_periods
of 1:
df['Rolling Sum'] = df.groupby('Type', sort=False)['Value'].apply(lambda x: x[::-1].rolling(4, min_periods=1).sum()[::-1])
Result:
Index Type Value Rolling Sum
0 1 left 5 22.0
1 2 left 9 34.0
2 3 left 0 25.0
3 4 left 8 25.0
4 5 left 17 17.0
5 6 straight 7 61.0
6 7 straight 4 77.0
7 8 straight 0 86.0
8 9 straight 50 97.0
9 10 straight 23 47.0
10 11 straight 13 24.0
11 12 straight 11 11.0