I have following dataframe in which I want to make a cumulative sum on a certain column (Value), together with a group by on field Group, but reset that cumulative sum to 0 when a value in another column (Quantity) is 0.
| Group | Quantity | Value | Cumulative_sum |
|---|---|---|---|
| A | 10 | 200 | 200 |
| B | 5 | 300 | 300 |
| A | 1 | 50 | 250 |
| A | 0 | 100 | 0 |
| C | 5 | 400 | 400 |
| A | 10 | 300 | 300 |
| B | 10 | 200 | 500 |
| A | 15 | 350 | 650 |
I have tried working with blocks like in the code below, but the issue is that it doesn't do the cumulative sum correctly:
blocks = df['Quantity'].eq(0)[::-1].cumsum()[::-1]
df['temp_field'] = (df.groupby(['Group', blocks])
['Value'].cumsum()
.where(df['Quantity']!=0,df['Value'])
)
df['Cumulative_sum'] = np.where(df['Quantity'] == 0, 0, df['temp_field'])
Could anyone help with this?
For the given resetting condition, use groupby.cumsum to create a Reset grouper that tells us when Quantity hits 0 within each Group:
condition = df.Quantity.eq(0)
df['Reset'] = condition.groupby(df.Group).cumsum()
# Group Quantity Value Cumulative_sum Reset
# 0 A 10 200 200 0
# 1 B 5 300 300 0
# 2 A 1 50 250 0
# 3 A 0 100 0 1
# 4 C 5 400 400 0
# 5 A 10 300 300 1
# 6 B 10 200 500 0
# 7 A 15 350 650 1
mask the Value column whenever the resetting condition is met and use another groupby.cumsum on both Group and Reset:
df['Cumul'] = df.Value.mask(condition, 0).groupby([df.Group, df.Reset]).cumsum()
# Group Quantity Value Cumulative_sum Reset Cumul
# 0 A 10 200 200 0 200
# 1 B 5 300 300 0 300
# 2 A 1 50 250 0 250
# 3 A 0 100 0 1 0
# 4 C 5 400 400 0 400
# 5 A 10 300 300 1 300
# 6 B 10 200 500 0 500
# 7 A 15 350 650 1 650