pythonpandasjupyter

cumulative sum based on conditions of 2 neighboring colum


Apologies, I'm a newbie to coding/pandas/python.

I am trying to sum the column only if there is a "1" located in the "buy" and "sell" neighboring columns

Date Buy Sell Value Cumulative Sum
01/01/2023 1
01/02/2023 1 5 5
01/03/23 1 6
01/04/23 1 7
01/05/23 1 1 8
01/06/23 5

I saw that there is a formula called **cumsum(axis=None, skipna=True, *args, kwargs), that might be able to do this. but unfortunately, I'm not too sure how to include the conditional on this based on the buy and sell columns

I'm not sure how to do this with pandas formulas. Also would i maybe need to utilze a loop for this?

Any suggestions or tips would be greatly appreciated!!


Solution

  • Try to avoid loops. I think this is what you are looking for:

    df1 = pd.DataFrame({
                        'date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-06', '2023-01-07', '2023-01-08', '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12'], 
                        'buy': [None,1,None,None,None,None,None,1,None,None,None,None],
                        'sell':[None,None,None,None,1,None,None,None,None,None,1,None],
                        'value':[1,5,1,1,1,5,1,5,1,1,1,5]
                        }
                      )
    
    # make sell negative so it can be discerned from buy
    df1['sell'] = df1['sell'] * -1
            
    print(df1)
              date  buy  sell  value
    0   2023-01-01  NaN   NaN      1
    1   2023-01-02  1.0   NaN      5
    2   2023-01-03  NaN   NaN      1
    3   2023-01-04  NaN   NaN      1
    4   2023-01-05  NaN  -1.0      1
    5   2023-01-06  NaN   NaN      5
    6   2023-01-07  NaN   NaN      1
    7   2023-01-08  1.0   NaN      5
    8   2023-01-09  NaN   NaN      1
    9   2023-01-10  NaN   NaN      1
    10  2023-01-11  NaN  -1.0      1
    11  2023-01-12  NaN   NaN      5
            
    # create 'buysell' column which is combination of 'buy' and 'sell' columns
            
    df1['buysell'] = df1['buy'].combine_first(df1['sell'])
            
    # use ffill() to fill buysell from 1 until it is not 1
    df1.loc[df1['buysell'].ffill() == 1, 'buysell'] = 1
            
            
    # create a mask for where 'buysell' is not NaN
    mask = ~df1['buysell'].isna()
                
    # use the mask to create a 'buysellvalue' column with the contents of 'value' column for rows where the mask is true
    df1.loc[mask, 'buysellvalue'] = df1.loc[mask, 'value']
                
    # use cumsum()
    df1['cumbuysellvalue'] = df1['buysellvalue'].cumsum()
                
    print(df1)
              date  buy  sell  value  buysell  buysellval  cumbuysellval
    0   2023-01-01  NaN   NaN      1      NaN         NaN            NaN
    1   2023-01-02  1.0   NaN      5      1.0         5.0            5.0
    2   2023-01-03  NaN   NaN      1      1.0         1.0            6.0
    3   2023-01-04  NaN   NaN      1      1.0         1.0            7.0
    4   2023-01-05  NaN  -1.0      1     -1.0         1.0            8.0
    5   2023-01-06  NaN   NaN      5      NaN         NaN            NaN
    6   2023-01-07  NaN   NaN      1      NaN         NaN            NaN
    7   2023-01-08  1.0   NaN      5      1.0         5.0           13.0
    8   2023-01-09  NaN   NaN      1      1.0         1.0           14.0
    9   2023-01-10  NaN   NaN      1      1.0         1.0           15.0
    10  2023-01-11  NaN  -1.0      1     -1.0         1.0           16.0
    11  2023-01-12  NaN   NaN      5      NaN         NaN            NaN