pythonpandas

How to mark 2 rows before and after a value change in a specific column?


I want to create a new column 'mark' in the dataframe. I already set the value of 'mark' on True when a change of value occurs in column 'Numbers'. I would like to set the values on 'True' two row before and after a value change in column 'Numbers'.

I would like to see:

Numbers mark
10 False
10 False
10 False
10 False
10 True
10 True
20 True
20 True
20 False
20 False
20 True
20 True
30 True
40 True
40 True
40 False
40 False
40 False

What in found so far:

import pandas as pd
data = [10,10,10,10,10,10,20,20,20,20,20,20,20,30,40,40,40,40,40] 
df = pd.DataFrame(data, columns=['Numbers'])
check=10
def detect_changes (row):
    global check 
    if row['Numbers'] == check :           
        return False
    else :
        check=check+10
        return True
df['mark']=df.apply (lambda row: detect_changes(row), axis=1)

Solution

  • You can use a rolling max:

    # better than the used function
    df['mark'] = df['Numbers'].diff().eq(10)
    
    N = 2 # number of rows before/after 
    df['mark2'] = df['mark'].rolling(2*N+1, min_periods=1, center=True).max().astype(bool)
    

    output:

        Numbers   mark  mark2
    0        10  False  False
    1        10  False  False
    2        10  False  False
    3        10  False  False
    4        10  False   True
    5        10  False   True
    6        20   True   True
    7        20  False   True
    8        20  False   True
    9        20  False  False
    10       20  False  False
    11       20  False   True
    12       20  False   True
    13       30   True   True
    14       40   True   True
    15       40  False   True
    16       40  False   True
    17       40  False  False
    18       40  False  False
    

    1 before, 2 after

    # better than the used function
    df['mark'] = df['Numbers'].diff().eq(10)
    
    # 1 before
    m1 = df.loc[::-1, 'mark'].rolling(1+1, min_periods=1).max().astype(bool)
    
    # 2 after
    m2 = df['mark'].rolling(2+1, min_periods=1).max().astype(bool)
    
    df['mark2'] = m1|m2
    

    output:

        Numbers   mark  mark2
    0        10  False  False
    1        10  False  False
    2        10  False  False
    3        10  False  False
    4        10  False  False
    5        10  False   True
    6        20   True   True
    7        20  False   True
    8        20  False   True
    9        20  False  False
    10       20  False  False
    11       20  False   True
    12       30   True   True
    13       40   True   True
    14       40  False   True
    15       40  False   True
    16       40  False  False
    17       40  False  False