pythonpandasdataframeif-statementmultiple-columns

Assign column status retrospectively in pandas


I have created the following pandas dataframe:

import pandas as pd
import numpy as np

ds = {'col1' : [234,321,284,286,287,300,301,303,305,299,288,300,299,287,286,280,279,270,269,301]}

df = pd.DataFrame(data=ds)

The dataframe looks like this:

display(df)

    col1
0   234
1   321
2   284
3   286
4   287
5   300
6   301
7   303
8   305
9   299
10  288
11  300
12  299
13  287
14  286
15  280
16  279
17  270
18  269
19  301

I have then created two columns (cnsIncr , cnsDecr), which calculate the consecutive increase and decrease in col1 respectively.

cnsIncr = []
cnsDecr = []

col1 = np.array(df['col1'])

for i in range(len(df)):
    cnsIncr.append(0)
    cnsDecr.append(0)
    if(col1[i] > col1[i-1]):
        cnsIncr[i] = cnsIncr[i-1]+1
    else:
        cnsIncr[i] = 0

    if(col1[i] < col1[i-1]):
        cnsDecr[i] = cnsDecr[i-1]+1
    else:
        cnsDecr[i] = 0

df['cnsIncr'] = cnsIncr
df['cnsDecr'] = cnsDecr

The resulting dataframe looks as follows:

enter image description here

I need to create a columns called Trend which is populated as follows:

So, from the example above, the resulting dataframe would look like this:

enter image description here

How can I do this?


Solution

  • I think a simple approach, assuming groups always start with a 0 and are increasing until a reset, would be to groupby.transform for stretches starting with 0 and assigning the value based on the last member of the group:

    N = 5
    
    # form groups starting with 0 for each column
    g1 = df['cnsIncr'].eq(0).cumsum()
    g2 = df['cnsDecr'].eq(0).cumsum()
    
    # identify if the last value is >= 5
    m1 = df['cnsIncr'].groupby(g1).transform('last').ge(N)
    m2 = df['cnsDecr'].groupby(g2).transform('last').ge(N)
    
    # assign the names based on the masks
    df['Trend'] = np.select([m1, m2], ['TrendUp', 'TrendDown'], 'NoTrend')
    

    Output:

        col1  cnsIncr  cnsDecr      Trend
    0    234        0        1    NoTrend
    1    321        1        0    NoTrend
    2    284        0        1    TrendUp
    3    286        1        0    TrendUp
    4    287        2        0    TrendUp
    5    300        3        0    TrendUp
    6    301        4        0    TrendUp
    7    303        5        0    TrendUp
    8    305        6        0    TrendUp
    9    299        0        1    NoTrend
    10   288        0        2    NoTrend
    11   300        1        0  TrendDown
    12   299        0        1  TrendDown
    13   287        0        2  TrendDown
    14   286        0        3  TrendDown
    15   280        0        4  TrendDown
    16   279        0        5  TrendDown
    17   270        0        6  TrendDown
    18   269        0        7  TrendDown
    19   301        1        0    NoTrend
    

    Intermediates:

        col1  cnsIncr  cnsDecr      Trend  g1  g2     m1     m2
    0    234        0        1    NoTrend   1   0  False  False
    1    321        1        0    NoTrend   1   1  False  False
    2    284        0        1    TrendUp   2   1   True  False
    3    286        1        0    TrendUp   2   2   True  False
    4    287        2        0    TrendUp   2   3   True  False
    5    300        3        0    TrendUp   2   4   True  False
    6    301        4        0    TrendUp   2   5   True  False
    7    303        5        0    TrendUp   2   6   True  False
    8    305        6        0    TrendUp   2   7   True  False
    9    299        0        1    NoTrend   3   7  False  False
    10   288        0        2    NoTrend   4   7  False  False
    11   300        1        0  TrendDown   4   8  False   True
    12   299        0        1  TrendDown   5   8  False   True
    13   287        0        2  TrendDown   6   8  False   True
    14   286        0        3  TrendDown   7   8  False   True
    15   280        0        4  TrendDown   8   8  False   True
    16   279        0        5  TrendDown   9   8  False   True
    17   270        0        6  TrendDown  10   8  False   True
    18   269        0        7  TrendDown  11   8  False   True
    19   301        1        0    NoTrend  11   9  False  False
    

    Note that you don't need to compute the cnsIncr/cnsDecr columns with a loop:

    N = 5
    
    g1 = df['cnsIncr'].eq(0).cumsum()
    g2 = df['cnsDecr'].eq(0).cumsum()
    
    m1 = df['cnsIncr'].groupby(g1).transform('last').gt(N)
    m2 = df['cnsDecr'].groupby(g2).transform('last').gt(N)
    
    df['Trend'] = np.select([m1, m2], ['TrendUp', 'TrendDown'], 'NoTrend')
    

    Output with intermediates:

        col1      Trend  g1  g2     m1     m2
    0    234    NoTrend   1   0  False  False
    1    321    NoTrend   1   1  False  False
    2    284    TrendUp   2   1   True  False
    3    286    TrendUp   2   2   True  False
    4    287    TrendUp   2   3   True  False
    5    300    TrendUp   2   4   True  False
    6    301    TrendUp   2   5   True  False
    7    303    TrendUp   2   6   True  False
    8    305    TrendUp   2   7   True  False
    9    299    NoTrend   3   7  False  False
    10   288    NoTrend   4   7  False  False
    11   300  TrendDown   4   8  False   True
    12   299  TrendDown   5   8  False   True
    13   287  TrendDown   6   8  False   True
    14   286  TrendDown   7   8  False   True
    15   280  TrendDown   8   8  False   True
    16   279  TrendDown   9   8  False   True
    17   270  TrendDown  10   8  False   True
    18   269  TrendDown  11   8  False   True
    19   301    NoTrend  11   9  False  False