pandasloops

Pandas get summation only on required cell of a columns


I have following table with "Rank" & "rice" columns as dataframe in pandas.

I want to have a sum in outcome col on the first positive rice col number. Also ignore the sum if current rice col is positive & above that is also positive.

Outcome col is the outcome which is needed.

Rank    rice    Outcome
------------------------
 1.00   -41.05  
 2.00   763.1   722.05
 3.00   -702.1  
 4.00   -96.35  
 5.00   -670.1  
 6.00   -699.6  
 7.00   -642.1  
 8.00   -31.05  
 9.00   -69.15  
 10.00  -49.15  
 11.00  -539.3  
 12.00  -30.35  
 13.00  -530.1  
 14.00  569.1   -133
 15.00  -600.1  
 16.00  601.1   1
 17.00  620.1   
 18.00  -46.65  
 19.00  -615.2  
 20.00  -581.8  
 21.00  -562.8  
 22.00  538.2   491.55
 23.00  79.85   
 24.00  -582.3  
 25.00  -615.1  
 26.00  22.85   102.7
 27.00  -583.5  
 28.00  -578.1  

I am using following code.

import pandas as pd


data = [['1',-41.05],['2',763.1],['3',-702.1],['4',-96.35],['5',-670.1],['6',-699.6],['7',-642.1],['8',-31.05],['9',-69.15],['10',-49.15],['11',-539.3],['12',-30.35],['13',-530.1],['14',569.1],['15',-600.1],['16',601.1],['17',620.1],['18',-46.65],['19',-615.2],['20',-581.8],['21',-562.8],['22',538.2],['23',79.85],['24',-582.3],['25',-615.1],['26',22.85],['27',-583.5],['28',-578.1]]

df=pd.DataFrame(data, columns=['Rank', 'rice'])
df.loc[df['rice'] >0 , 'outcome'] = df['rice']-df['rice'].shift()

However the outcome is not as per the desired outcome col. Please help.


Solution

  • There is something not logic in either 491.55 or 102.7.

    Anyway, you could use custom groups/mask and groupby.transform. I'm providing two alternatives (if you want to ignore or not the successive positive values):

    m = df['rice'].gt(0)
    m2 = (m&~m.shift(fill_value=False))
    df.loc[m2, 'Outcome'] = (df['rice']
       #.mask(m&~m2)
       .groupby(m2[::-1].cumsum())
       .transform('first').add(df['rice'])
    )
    

    Output (Outcome2 is with the uncommented .mask(m&~m2)):

       Rank    rice  Outcome  Outcome2
    0     1  -41.05      NaN       NaN
    1     2  763.10   722.05    722.05
    2     3 -702.10      NaN       NaN
    3     4  -96.35      NaN       NaN
    4     5 -670.10      NaN       NaN
    5     6 -699.60      NaN       NaN
    6     7 -642.10      NaN       NaN
    7     8  -31.05      NaN       NaN
    8     9  -69.15      NaN       NaN
    9    10  -49.15      NaN       NaN
    10   11 -539.30      NaN       NaN
    11   12  -30.35      NaN       NaN
    12   13 -530.10      NaN       NaN
    13   14  569.10  -133.00   -133.00
    14   15 -600.10      NaN       NaN
    15   16  601.10     1.00      1.00
    16   17  620.10      NaN       NaN
    17   18  -46.65      NaN       NaN
    18   19 -615.20      NaN       NaN
    19   20 -581.80      NaN       NaN
    20   21 -562.80      NaN       NaN
    21   22  538.20  1158.30    491.55
    22   23   79.85      NaN       NaN
    23   24 -582.30      NaN       NaN
    24   25 -615.10      NaN       NaN
    25   26   22.85   102.70   -559.45
    26   27 -583.50      NaN       NaN
    27   28 -578.10      NaN       NaN
    

    If you want to avoid having a double value if the first row is positive:

    m = df['rice'].gt(0)
    m2 = (m&~m.shift(fill_value=False))
    
    g = (df['rice']
       .mask(m&~m2)
       .groupby(m2[::-1].cumsum())
    )
    
    df.loc[m2, 'Outcome'] = (g.transform('first')
                              .mask(g.transform('size').eq(1), 0)
                              .add(df['rice'])
                            )
    

    Example (with row 0 removed):

        Rank    rice  Outcome
    1      2  763.10   763.10
    2      3 -702.10      NaN
    3      4  -96.35      NaN
    4      5 -670.10      NaN
    5      6 -699.60      NaN
    6      7 -642.10      NaN
    7      8  -31.05      NaN
    8      9  -69.15      NaN
    9     10  -49.15      NaN
    10    11 -539.30      NaN
    11    12  -30.35      NaN
    12    13 -530.10      NaN
    13    14  569.10  -133.00
    14    15 -600.10      NaN
    15    16  601.10     1.00
    16    17  620.10      NaN
    17    18  -46.65      NaN
    18    19 -615.20      NaN
    19    20 -581.80      NaN
    20    21 -562.80      NaN
    21    22  538.20   491.55
    22    23   79.85      NaN
    23    24 -582.30      NaN
    24    25 -615.10      NaN
    25    26   22.85  -559.45
    26    27 -583.50      NaN
    27    28 -578.10      NaN