pythonpandascumsumrolling-sum

Cumulative sum when value in other column is non-zero


I have a huge dataframe with multiple columns as below . I am trying to find cummulative sum of time when value in b is -1 which is preceeded by 1. I have tried cummulative sum but it's not working.

I am looking for cumulative sum of values only when the values in b are -1 for that group alone. Once the value in b changes to 0 I want cummulative sum to reset to 0

My cummulative sum values should match with the values in cummulative_sum_column

**date time**    **id**  **value**  **time**           **a****b**   **cummulative sum**
11/1/2021 0:05   v01111     4           0               0     0
11/1/2021 1:30   v01111     11      57.302              1     1 
11/1/2021 4:19   v01111     1       10129.88            0    -1 
11/1/2021 4:19   v01111     1       0.49                0    -1        10130.37
11/1/2021 9:30   v01111     2       0.202               0     0 
11/1/2021 9:31   v01111     11      58.699              1     1 
11/1/2021 10:31  v01111     1       3573.728            0    -1 
11/1/2021 10:31  v01111     1       0.29                0    -1       3574.018
11/2/2021 21:15  v01111     2       0.2                 0     0 
12/9/2021 20:17  v01112     11      58.525              1    0  
12/9/2021 20:53  v01113     11      2145.745            0    -1 
12/9/2021 20:53  v01114     1       0.016               0    -1           2146
12/9/2021 22:05  v01150     0       0.01                0    0  
12/9/2021 22:05  v01151     0       1.188               0    0  
12/9/2021 22:05  v01152     1       0.312               0   0   
12/9/2021 22:05  v01153     1       1.49                0   -1  
12/9/2021 22:09  v01153     1       4                   1   0   
12/10/2021 22:07 v01153     1       6                   0   -1            6

I tried below but the code calculates cummulative sum of all values in when the values in b == -1 it doesnot reset to 0 once the value in b is 0

df['test']= df.loc[(df['b'] == -1) & (df['b'].shift(+1) == 1), 'time'].cumsum()

Solution

  • We group by runs of the same value in 'b' and apply the sum to groups; then we zero out the sums where 'b' is not -1. 'cumsum' shows the sum of individual groups of 'b'==-1 for each row in the group

    grouper = (df['b'] != df['b'].shift()).cumsum()
    df['cumsum'] = df.groupby(grouper, group_keys = False)['time'].transform(sum)
    df.loc[df['b'] != -1,'cumsum'] = 0
    df
    

    output

             time    a    b    cumsum
    --  ---------  ---  ---  --------
     0      0        0    0      0
     1     57.302    1    1      0
     2  10129.9      0   -1  10130.4
     3      0.49     0   -1  10130.4
     4      0.202    0    0      0
     5     58.699    1    1      0
     6   3573.73     0   -1   3574.02
     7      0.29     0   -1   3574.02
     8      0.2      0    0      0
     9     58.525    1    0      0
    10   2145.74     0   -1   2145.76
    11      0.016    0   -1   2145.76
    12      0.01     0    0      0
    13      1.188    0    0      0
    14      0.312    0    0      0
    15      1.49     0   -1      1.49
    16      4        1    0      0
    17      6        0   -1      6