I have created the following pandas dataframe:
import pandas as pd
import numpy as np
ds = {'col1' : [234,321,284,286,287,300,301,303,305,299,288,300,299,287,286,280,279,270,269,301]}
df = pd.DataFrame(data=ds)
The dataframe looks like this:
display(df)
col1
0 234
1 321
2 284
3 286
4 287
5 300
6 301
7 303
8 305
9 299
10 288
11 300
12 299
13 287
14 286
15 280
16 279
17 270
18 269
19 301
I have then created two columns (cnsIncr
, cnsDecr
), which calculate the consecutive increase and decrease in col1
respectively.
cnsIncr = []
cnsDecr = []
col1 = np.array(df['col1'])
for i in range(len(df)):
cnsIncr.append(0)
cnsDecr.append(0)
if(col1[i] > col1[i-1]):
cnsIncr[i] = cnsIncr[i-1]+1
else:
cnsIncr[i] = 0
if(col1[i] < col1[i-1]):
cnsDecr[i] = cnsDecr[i-1]+1
else:
cnsDecr[i] = 0
df['cnsIncr'] = cnsIncr
df['cnsDecr'] = cnsDecr
The resulting dataframe looks as follows:
I need to create a columns called Trend
which is populated as follows:
cnsIncr
contains a value greater or equal to 5, then Trend = UpTrend
starting from the record in which consIncr = 0
and ending at the record for which cnsIncr >= 5
;cnsDecr
contains a value greater or equal to 5, then Trend = DownTrend
starting from the record in which cnsDecr = 0
and ending at the record for which cnsDecr >= 5
;cnsIncr
nor the column cnsDecr
contain a value of 5, then Trend = NoTrend
So, from the example above, the resulting dataframe would look like this:
How can I do this?
I think a simple approach, assuming groups always start with a 0
and are increasing until a reset, would be to groupby.transform
for stretches starting with 0
and assigning the value based on the last member of the group:
N = 5
# form groups starting with 0 for each column
g1 = df['cnsIncr'].eq(0).cumsum()
g2 = df['cnsDecr'].eq(0).cumsum()
# identify if the last value is >= 5
m1 = df['cnsIncr'].groupby(g1).transform('last').ge(N)
m2 = df['cnsDecr'].groupby(g2).transform('last').ge(N)
# assign the names based on the masks
df['Trend'] = np.select([m1, m2], ['TrendUp', 'TrendDown'], 'NoTrend')
Output:
col1 cnsIncr cnsDecr Trend
0 234 0 1 NoTrend
1 321 1 0 NoTrend
2 284 0 1 TrendUp
3 286 1 0 TrendUp
4 287 2 0 TrendUp
5 300 3 0 TrendUp
6 301 4 0 TrendUp
7 303 5 0 TrendUp
8 305 6 0 TrendUp
9 299 0 1 NoTrend
10 288 0 2 NoTrend
11 300 1 0 TrendDown
12 299 0 1 TrendDown
13 287 0 2 TrendDown
14 286 0 3 TrendDown
15 280 0 4 TrendDown
16 279 0 5 TrendDown
17 270 0 6 TrendDown
18 269 0 7 TrendDown
19 301 1 0 NoTrend
Intermediates:
col1 cnsIncr cnsDecr Trend g1 g2 m1 m2
0 234 0 1 NoTrend 1 0 False False
1 321 1 0 NoTrend 1 1 False False
2 284 0 1 TrendUp 2 1 True False
3 286 1 0 TrendUp 2 2 True False
4 287 2 0 TrendUp 2 3 True False
5 300 3 0 TrendUp 2 4 True False
6 301 4 0 TrendUp 2 5 True False
7 303 5 0 TrendUp 2 6 True False
8 305 6 0 TrendUp 2 7 True False
9 299 0 1 NoTrend 3 7 False False
10 288 0 2 NoTrend 4 7 False False
11 300 1 0 TrendDown 4 8 False True
12 299 0 1 TrendDown 5 8 False True
13 287 0 2 TrendDown 6 8 False True
14 286 0 3 TrendDown 7 8 False True
15 280 0 4 TrendDown 8 8 False True
16 279 0 5 TrendDown 9 8 False True
17 270 0 6 TrendDown 10 8 False True
18 269 0 7 TrendDown 11 8 False True
19 301 1 0 NoTrend 11 9 False False
Note that you don't need to compute the cnsIncr
/cnsDecr
columns with a loop:
N = 5
g1 = df['cnsIncr'].eq(0).cumsum()
g2 = df['cnsDecr'].eq(0).cumsum()
m1 = df['cnsIncr'].groupby(g1).transform('last').gt(N)
m2 = df['cnsDecr'].groupby(g2).transform('last').gt(N)
df['Trend'] = np.select([m1, m2], ['TrendUp', 'TrendDown'], 'NoTrend')
Output with intermediates:
col1 Trend g1 g2 m1 m2
0 234 NoTrend 1 0 False False
1 321 NoTrend 1 1 False False
2 284 TrendUp 2 1 True False
3 286 TrendUp 2 2 True False
4 287 TrendUp 2 3 True False
5 300 TrendUp 2 4 True False
6 301 TrendUp 2 5 True False
7 303 TrendUp 2 6 True False
8 305 TrendUp 2 7 True False
9 299 NoTrend 3 7 False False
10 288 NoTrend 4 7 False False
11 300 TrendDown 4 8 False True
12 299 TrendDown 5 8 False True
13 287 TrendDown 6 8 False True
14 286 TrendDown 7 8 False True
15 280 TrendDown 8 8 False True
16 279 TrendDown 9 8 False True
17 270 TrendDown 10 8 False True
18 269 TrendDown 11 8 False True
19 301 NoTrend 11 9 False False