I have a dataframe as below.
Part Quantity Inventory
AAA -1 8
AAA -1 NaN
AAA -5 NaN
AAA 10 NaN
AAA -3 NaN
BBB 5 8
BBB -3 NaN
BBB -3 NaN
BBB 5 NaN
BBB -3 NaN
I have the following code to replace the NaN values with the cumulative sum from the 'Quantity' column and starting value is first non-NaN in 'Inventory':
df.loc[~df['Inventory'].isna(), 'Quantity'] = df['Inventory']
mask = ~df['Inventory'].isna()
group = mask.cumsum()
df['Inventory'] = df.groupby(group)['Quantity'].cumsum()
df.loc[mask, 'Inventory'] = df['Quantity']
This gives me as result:
Part Quantity Inventory
AAA 8 8
AAA -1 7
AAA -5 2
AAA 10 12
AAA -3 9
BBB 7 7
BBB -3 4
BBB -3 1
BBB 5 6
BBB -3 3
Which is all good, except that the first value of each 'Part' group in the 'Quantity column is replaced by the first value in the 'Inventory' column.
Any suggestions on how to avoid this and keep the initial value in the 'Quantity' column?
Rewrite your logic to avoid modifying the original column:
mask = ~df['Inventory'].isna()
group = mask.cumsum()
df.loc[~mask, 'Inventory'] = (df['Inventory'].fillna(df['Quantity'])
.groupby(group).cumsum())
Output:
Part Quantity Inventory
0 AAA -1 8.0
1 AAA -1 7.0
2 AAA -5 2.0
3 AAA 10 12.0
4 AAA -3 9.0
5 BBB 5 8.0
6 BBB -3 5.0
7 BBB -3 2.0
8 BBB 5 7.0
9 BBB -3 4.0