I want to create a new column 'mark' in the dataframe. I already set the value of 'mark' on True when a change of value occurs in column 'Numbers'. I would like to set the values on 'True' two row before and after a value change in column 'Numbers'.
I would like to see:
Numbers | mark |
---|---|
10 | False |
10 | False |
10 | False |
10 | False |
10 | True |
10 | True |
20 | True |
20 | True |
20 | False |
20 | False |
20 | True |
20 | True |
30 | True |
40 | True |
40 | True |
40 | False |
40 | False |
40 | False |
What in found so far:
import pandas as pd
data = [10,10,10,10,10,10,20,20,20,20,20,20,20,30,40,40,40,40,40]
df = pd.DataFrame(data, columns=['Numbers'])
check=10
def detect_changes (row):
global check
if row['Numbers'] == check :
return False
else :
check=check+10
return True
df['mark']=df.apply (lambda row: detect_changes(row), axis=1)
You can use a rolling max:
# better than the used function
df['mark'] = df['Numbers'].diff().eq(10)
N = 2 # number of rows before/after
df['mark2'] = df['mark'].rolling(2*N+1, min_periods=1, center=True).max().astype(bool)
output:
Numbers mark mark2
0 10 False False
1 10 False False
2 10 False False
3 10 False False
4 10 False True
5 10 False True
6 20 True True
7 20 False True
8 20 False True
9 20 False False
10 20 False False
11 20 False True
12 20 False True
13 30 True True
14 40 True True
15 40 False True
16 40 False True
17 40 False False
18 40 False False
# better than the used function
df['mark'] = df['Numbers'].diff().eq(10)
# 1 before
m1 = df.loc[::-1, 'mark'].rolling(1+1, min_periods=1).max().astype(bool)
# 2 after
m2 = df['mark'].rolling(2+1, min_periods=1).max().astype(bool)
df['mark2'] = m1|m2
output:
Numbers mark mark2
0 10 False False
1 10 False False
2 10 False False
3 10 False False
4 10 False False
5 10 False True
6 20 True True
7 20 False True
8 20 False True
9 20 False False
10 20 False False
11 20 False True
12 30 True True
13 40 True True
14 40 False True
15 40 False True
16 40 False False
17 40 False False