pythonpython-3.xpandasperiodicity

Identifying row numbers where value is stable before and after the value in the column hits a specified value


EDITED

I have a pandas dataframe like so:

data = {'ID': ['A', 'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'A',    'B',    'B',    'B',    'B',    'B',    'B',    'B',    'B',    'B',    'B'],
    'column_1': [0, 0,  0,  0,  0.1,    1,  1.5,    2,  3,  4,  4.5,    5,  4.9,    3,  2,  1.8,    1,  0,  0,  1,  3,  0,  1.3,    2,  3,  4.3,    4.8,    5,  4.2,    3.5,    3,  2.6,    2,  1.9,    1,  0,  0,  0,  0,  0,  0.1,    0.2,    0.3,    1,  2,  3,  5,  4,  2,  0.5,    0,  0],
    'column_2': [13,    25, 96, 59, 5,  92, 82, 141,    50, 85, 84, 113,    119,    128,    8,  133,    82, 10, 15, 62, 11, 68, 18, 24, 37, 55, 83, 48, 13, 81, 43, 36, 56, 43, 36, 46, 45, 127,    55, 67, 113,    98, 78, 78, 57, 131,    121,    126,    142,    51, 64, 95]}


    ID  column_1  column_2
0   A       0.0        13
1   A       0.0        25
2   A       0.0        96
3   A       0.0        59
4   A       0.1         5
5   A       1.0        92
6   A       1.5        82
7   A       2.0       141
8   A       3.0        50
9   A       4.0        85
10  A       4.5        84
11  A       5.0       113
12  A       4.9       119
13  A       3.0       128
14  A       2.0         8
15  A       1.8       133
16  A       1.0        82
17  A       0.0        10
18  A       0.0        15
19  A       1.0        62
20  A       3.0        11
21  A       0.0        68
22  A       1.3        18
23  A       2.0        24
24  A       3.0        37
25  A       4.3        55
26  A       4.8        83
27  A       5.0        48
28  A       4.2        13
29  A       3.5        81
30  A       3.0        43
31  A       2.6        36
32  A       2.0        56
33  A       1.9        43
34  A       1.0        36
35  A       0.0        46
36  A       0.0        45
37  A       0.0       127
38  A       0.0        55
39  A       0.0        67
40  A       0.1       113
41  A       0.2        98
42  B       0.3        78
43  B       1.0        78
44  B       2.0        57
45  B       3.0       131
46  B       5.0       121
47  B       4.0       126
48  B       2.0       142
49  B       0.5        51
50  B       0.0        64
51  B       0.0        95

Tracing back from when the value hits 5 in column_1, I want to find the value in column_2 just before the value in column_1 increased from 0 and just after it came back down to 0. So, in the data frame above, the values in column_2 would be 5, 10 and 18, 46. I want to perform some arithmetic and would like to add 2 columns before & after with those values grouped by the ID column. The expected output would be:

    ID  column_1  column_2  Before  After
0   A       0.0        13       0      0
1   A       0.0        25       0      0
2   A       0.0        96       0      0
3   A       0.0        59       0      0
4   A       0.1         5       0      0
5   A       1.0        92       0      0
6   A       1.5        82       0      0
7   A       2.0       141       0      0
8   A       3.0        50       0      0
9   A       4.0        85       0      0
10  A       4.5        84       0      0
11  A       5.0       113       5     10
12  A       4.9       119       0      0
13  A       3.0       128       0      0
14  A       2.0         8       0      0
15  A       1.8       133       0      0
16  A       1.0        82       0      0
17  A       0.0        10       0      0
18  A       0.0        15       0      0
19  A       1.0        62       0      0
20  A       3.0        11       0      0
21  A       0.0        68       0      0
22  A       1.3        18       0      0
23  A       2.0        24       0      0
24  A       3.0        37       0      0
25  A       4.3        55       0      0
26  A       4.8        83       0      0
27  A       5.0        48      18     46
28  A       4.2        13       0      0
29  A       3.5        81       0      0
30  A       3.0        43       0      0
31  A       2.6        36       0      0
32  A       2.0        56       0      0
33  A       1.9        43       0      0
34  A       1.0        36       0      0
35  A       0.0        46       0      0
36  A       0.0        45       0      0
37  A       0.0       127       0      0
38  A       0.0        55       0      0
39  A       0.0        67       0      0
40  A       0.1       113       0      0
41  A       0.2        98       0      0
42  B       0.3        78       0      0
43  B       1.0        78       0      0
44  B       2.0        57       0      0
45  B       3.0       131       0      0
46  B       5.0       121      78     64
47  B       4.0       126       0      0
48  B       2.0       142       0      0
49  B       0.5        51       0      0
50  B       0.0        64       0      0
51  B       0.0        95       0      0

For a given ID if column_1 starts with a non zero value, it should give the first value of column_2 for that group.

The rest of the rows in Before and After can be filled with null or zeroes. Is there an elegant way to achieve this?


Solution

  • With where and shift you can generate series where only the endpoints of the subsequences are filled and the rest is padded with NaN. Combined with ffill (or backfill, respectively), locating the rows where column_1 equals 5 gives pretty much the desired result:

    col1 = df.column_1
    col2 = df.column_2
    
    df.loc[col1.eq(5), "Before"] = col2.where(col1.gt(0) & col1.shift().eq(0)).ffill()
    df.loc[col1.eq(5), "After"] = col2.where(col1.eq(0) & col1.shift().gt(0)).backfill()
    

    Result:

       ID  column_1  column_2  Before  After
    0   A       0.0        13     NaN    NaN
    1   A       0.0        25     NaN    NaN
    2   A       0.0        96     NaN    NaN
    3   A       0.0        59     NaN    NaN
    4   A       0.1         5     NaN    NaN
    5   A       1.0        92     NaN    NaN
    6   A       1.5        82     NaN    NaN
    7   A       2.0       141     NaN    NaN
    8   A       3.0        50     NaN    NaN
    9   A       4.0        85     NaN    NaN
    10  A       4.5        84     NaN    NaN
    11  A       5.0       113     5.0   10.0
    12  A       4.9       119     NaN    NaN
    13  A       3.0       128     NaN    NaN
    14  A       2.0         8     NaN    NaN
    15  A       1.8       133     NaN    NaN
    16  A       1.0        82     NaN    NaN
    17  A       0.0        10     NaN    NaN
    18  A       0.0        15     NaN    NaN
    19  A       1.0        62     NaN    NaN
    20  A       3.0        11     NaN    NaN
    21  A       0.0        68     NaN    NaN
    22  A       1.3        18     NaN    NaN
    23  A       2.0        24     NaN    NaN
    24  A       3.0        37     NaN    NaN
    25  A       4.3        55     NaN    NaN
    26  A       4.8        83     NaN    NaN
    27  A       5.0        48    18.0   46.0
    28  A       4.2        13     NaN    NaN
    29  A       3.5        81     NaN    NaN
    30  A       3.0        43     NaN    NaN
    31  A       2.6        36     NaN    NaN
    32  A       2.0        56     NaN    NaN
    33  A       1.9        43     NaN    NaN
    34  A       1.0        36     NaN    NaN
    35  A       0.0        46     NaN    NaN
    36  A       0.0        45     NaN    NaN
    37  A       0.0       127     NaN    NaN
    38  A       0.0        55     NaN    NaN
    39  A       0.0        67     NaN    NaN
    

    With groupby()

    def set_flags(frame):
    
        col1 = frame.column_1
        col2 = frame.column_2
        m = col1.eq(5)
    
        frame.loc[m, "Before"] = col2.where(col1.gt(0) & col1.shift(fill_value=0).eq(0)).ffill()
        frame.loc[m, "After"] = col2.where(col1.eq(0) & col1.shift().gt(0)).backfill()
    
        return frame
    
    df[["Before", "After"]] = 0  # initializing columns with 0 keeps the int type
    df = df.groupby("ID").apply(set_flags)
    

    Result with new data

       ID  column_1  column_2  Before  After
    0   A       0.0        13       0      0
    1   A       0.0        25       0      0
    2   A       0.0        96       0      0
    3   A       0.0        59       0      0
    4   A       0.1         5       0      0
    5   A       1.0        92       0      0
    6   A       1.5        82       0      0
    7   A       2.0       141       0      0
    8   A       3.0        50       0      0
    9   A       4.0        85       0      0
    10  A       4.5        84       0      0
    11  A       5.0       113       5     10
    12  A       4.9       119       0      0
    13  A       3.0       128       0      0
    14  A       2.0         8       0      0
    15  A       1.8       133       0      0
    16  A       1.0        82       0      0
    17  A       0.0        10       0      0
    18  A       0.0        15       0      0
    19  A       1.0        62       0      0
    20  A       3.0        11       0      0
    21  A       0.0        68       0      0
    22  A       1.3        18       0      0
    23  A       2.0        24       0      0
    24  A       3.0        37       0      0
    25  A       4.3        55       0      0
    26  A       4.8        83       0      0
    27  A       5.0        48      18     46
    28  A       4.2        13       0      0
    29  A       3.5        81       0      0
    30  A       3.0        43       0      0
    31  A       2.6        36       0      0
    32  A       2.0        56       0      0
    33  A       1.9        43       0      0
    34  A       1.0        36       0      0
    35  A       0.0        46       0      0
    36  A       0.0        45       0      0
    37  A       0.0       127       0      0
    38  A       0.0        55       0      0
    39  A       0.0        67       0      0
    40  A       0.1       113       0      0
    41  A       0.2        98       0      0
    42  B       0.3        78       0      0
    43  B       1.0        78       0      0
    44  B       2.0        57       0      0
    45  B       3.0       131       0      0
    46  B       5.0       121      78     64
    47  B       4.0       126       0      0
    48  B       2.0       142       0      0
    49  B       0.5        51       0      0
    50  B       0.0        64       0      0
    51  B       0.0        95       0      0