I have following table with "Rank" & "rice" columns as dataframe in pandas.
I want to have a sum in outcome col on the first positive rice col number. Also ignore the sum if current rice col is positive & above that is also positive.
Outcome col is the outcome which is needed.
Rank rice Outcome
------------------------
1.00 -41.05
2.00 763.1 722.05
3.00 -702.1
4.00 -96.35
5.00 -670.1
6.00 -699.6
7.00 -642.1
8.00 -31.05
9.00 -69.15
10.00 -49.15
11.00 -539.3
12.00 -30.35
13.00 -530.1
14.00 569.1 -133
15.00 -600.1
16.00 601.1 1
17.00 620.1
18.00 -46.65
19.00 -615.2
20.00 -581.8
21.00 -562.8
22.00 538.2 491.55
23.00 79.85
24.00 -582.3
25.00 -615.1
26.00 22.85 102.7
27.00 -583.5
28.00 -578.1
I am using following code.
import pandas as pd
data = [['1',-41.05],['2',763.1],['3',-702.1],['4',-96.35],['5',-670.1],['6',-699.6],['7',-642.1],['8',-31.05],['9',-69.15],['10',-49.15],['11',-539.3],['12',-30.35],['13',-530.1],['14',569.1],['15',-600.1],['16',601.1],['17',620.1],['18',-46.65],['19',-615.2],['20',-581.8],['21',-562.8],['22',538.2],['23',79.85],['24',-582.3],['25',-615.1],['26',22.85],['27',-583.5],['28',-578.1]]
df=pd.DataFrame(data, columns=['Rank', 'rice'])
df.loc[df['rice'] >0 , 'outcome'] = df['rice']-df['rice'].shift()
However the outcome is not as per the desired outcome col. Please help.
There is something not logic in either 491.55 or 102.7.
Anyway, you could use custom groups/mask and groupby.transform
. I'm providing two alternatives (if you want to ignore or not the successive positive values):
m = df['rice'].gt(0)
m2 = (m&~m.shift(fill_value=False))
df.loc[m2, 'Outcome'] = (df['rice']
#.mask(m&~m2)
.groupby(m2[::-1].cumsum())
.transform('first').add(df['rice'])
)
Output (Outcome2 is with the uncommented .mask(m&~m2)
):
Rank rice Outcome Outcome2
0 1 -41.05 NaN NaN
1 2 763.10 722.05 722.05
2 3 -702.10 NaN NaN
3 4 -96.35 NaN NaN
4 5 -670.10 NaN NaN
5 6 -699.60 NaN NaN
6 7 -642.10 NaN NaN
7 8 -31.05 NaN NaN
8 9 -69.15 NaN NaN
9 10 -49.15 NaN NaN
10 11 -539.30 NaN NaN
11 12 -30.35 NaN NaN
12 13 -530.10 NaN NaN
13 14 569.10 -133.00 -133.00
14 15 -600.10 NaN NaN
15 16 601.10 1.00 1.00
16 17 620.10 NaN NaN
17 18 -46.65 NaN NaN
18 19 -615.20 NaN NaN
19 20 -581.80 NaN NaN
20 21 -562.80 NaN NaN
21 22 538.20 1158.30 491.55
22 23 79.85 NaN NaN
23 24 -582.30 NaN NaN
24 25 -615.10 NaN NaN
25 26 22.85 102.70 -559.45
26 27 -583.50 NaN NaN
27 28 -578.10 NaN NaN
If you want to avoid having a double value if the first row is positive:
m = df['rice'].gt(0)
m2 = (m&~m.shift(fill_value=False))
g = (df['rice']
.mask(m&~m2)
.groupby(m2[::-1].cumsum())
)
df.loc[m2, 'Outcome'] = (g.transform('first')
.mask(g.transform('size').eq(1), 0)
.add(df['rice'])
)
Example (with row 0 removed):
Rank rice Outcome
1 2 763.10 763.10
2 3 -702.10 NaN
3 4 -96.35 NaN
4 5 -670.10 NaN
5 6 -699.60 NaN
6 7 -642.10 NaN
7 8 -31.05 NaN
8 9 -69.15 NaN
9 10 -49.15 NaN
10 11 -539.30 NaN
11 12 -30.35 NaN
12 13 -530.10 NaN
13 14 569.10 -133.00
14 15 -600.10 NaN
15 16 601.10 1.00
16 17 620.10 NaN
17 18 -46.65 NaN
18 19 -615.20 NaN
19 20 -581.80 NaN
20 21 -562.80 NaN
21 22 538.20 491.55
22 23 79.85 NaN
23 24 -582.30 NaN
24 25 -615.10 NaN
25 26 22.85 -559.45
26 27 -583.50 NaN
27 28 -578.10 NaN