I have data that looks like this:
It's standard financial price data (open, high, low, close).
In addition, I run some calculations.
'major_check'
occasionally returns 1 or 2 (which 'minor_check'
will then also return).
'minor_check'
also returns 1 or 2, but more frequently.
the rest is filled with 0
or NaN
.
I'd like to test for specific patterns:
'major_check'
, I want to see if I can find a 21212 pattern in 'minor_check'
, with 21 preceding the central 2 and 12 following it.'major_check'
, I'd like to find a 12121 pattern in 'minor_check
'I highlighted a 21212 pattern in the screenshot to give a better idea on what I am looking for.
Once the 21212 or 12121 patterns are found, I'll check if specific rules applied on open/high/low/close (corresponding to the 5 rows constituting the pattern) are met or not.
Of course, one could naively iterate through the dataframe but this doesn't sound like the Pythonic way to do it.
I didn't manage to find a good way to do this, since a 21212 pattern can have some 0s inside it
As this answer by Timeless looked surprisingly complex, here is a quite simpler one.
Method:
NaN
and None
),numpy.where
and pandas.shift
to check for patterns row-wise (faster),pandas.rolling
-probably faster, more compact, but still readable.df
.You haven't specified how to flag the findings. Here they get marked as a True
in two new columns, one for each pattern, appended to the original dataframe, for whatever use you would plan for them. They are called "hit1"
and "hit2"
.
Input data
No text input data in your post, so until then I came up with my own. It is designed to produce one hit for each pattern:
import pandas as pd
import numpy as np
# Start dataframe
df = pd.DataFrame({'minor': [None,2, 1, None,None,2, None,2, None,None,None,
1, 2, None,1, None,None,2, None,1, 2, None,None],
'major': [None,None,None,None,None,2, None,None,None,None,None,
None,2, None,1, None,None,None, None,None,None,None,None]})
df
minor major
0 NaN NaN
1 2.0 NaN
2 1.0 NaN
3 NaN NaN
4 NaN NaN
5 2.0 2.0
6 NaN NaN
7 2.0 NaN
8 NaN NaN
9 NaN NaN
10 NaN NaN
11 1.0 NaN
12 2.0 2.0
13 NaN NaN
14 1.0 1.0
15 NaN NaN
16 NaN NaN
17 2.0 NaN
18 NaN NaN
19 1.0 NaN
20 2.0 NaN
21 NaN NaN
22 NaN NaN
Locate hits
Skip rows without test results
# Remove rows without minor test result
df1 = df.dropna(axis=0,subset='minor').copy()
# No reset_index because we'll use it to report back to df.
# Patterns of 'minor test'
minor_pat1 = [1,2,1,2,1]
minor_pat2 = [2,1,2,1,2]
Pattern search:
.shift()
and np.where
# Deploy shift columns, looking 2 values backwards and 2 forward
for i in range(-2,3): # i in [-2,-1, 0, 1, 2]
df1[i] = df1['minor'].shift(i) # create a column named i
# Test for both patterns and major test value
df1['hit1'] = np.where(df1['major']==1, # case 12121
df1[list(range(-2,3))].eq(minor_pat1).all(axis=1),
False)
df1['hit2'] = np.where(df1['major']==2, # case 21212
df1[list(range(-2,3))].eq(minor_pat2).all(axis=1),
False)
df1 # Temporary dataframe with findings located
minor major -2 -1 0 1 2 hit1 hit2
1 2.0 NaN 2.0 1.0 2.0 NaN NaN False False
2 1.0 NaN 2.0 2.0 1.0 2.0 NaN False False
5 2.0 2.0 1.0 2.0 2.0 1.0 2.0 False False
7 2.0 NaN 2.0 1.0 2.0 2.0 1.0 False False
11 1.0 NaN 1.0 2.0 1.0 2.0 2.0 False False
12 2.0 2.0 2.0 1.0 2.0 1.0 2.0 False True
14 1.0 1.0 1.0 2.0 1.0 2.0 1.0 True False
17 2.0 NaN 2.0 1.0 2.0 1.0 2.0 False False
19 1.0 NaN NaN 2.0 1.0 2.0 1.0 False False
20 2.0 NaN NaN NaN 2.0 1.0 2.0 False False
rolling
, preferred:.rolling()
was designed for that purpose exactly.
Just too bad they haven't implemented .rolling().eq()
yet (list of window functions).
This is why we must resort to apply
.eq()
from inside a lambda
function.
df1['hit1'] = (df1['major']==1) & (df1['minor']
.rolling(window=5, center=True)
.apply(lambda x : x.eq(minor_pat1).all()))
df1['hit2'] = (df1['major']==2) & (df1['minor']
.rolling(window=5, center=True)
.apply(lambda x : x.eq(minor_pat2).all()))
minor major hit1 hit2
1 2.0 NaN False False
2 1.0 NaN False False
5 2.0 2.0 False False
7 2.0 NaN False False
11 1.0 NaN False False
12 2.0 2.0 False True
14 1.0 1.0 True False
17 2.0 NaN False False
19 1.0 NaN False False
20 2.0 NaN False False
Finally report back to original df
df.loc[df1.index,'hit1'] = df1['hit1']
df.loc[df1.index,'hit2'] = df1['hit2']
df
minor major hit1 hit2
0 NaN NaN NaN NaN
1 2.0 NaN False False
2 1.0 NaN False False
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
5 2.0 2.0 False False
6 NaN NaN NaN NaN
7 2.0 NaN False False
8 NaN NaN NaN NaN
9 NaN NaN NaN NaN
10 NaN NaN NaN NaN
11 1.0 NaN False False
12 2.0 2.0 False True
13 NaN NaN NaN NaN
14 1.0 1.0 True False
15 NaN NaN NaN NaN
16 NaN NaN NaN NaN
17 2.0 NaN False False
18 NaN NaN NaN NaN
19 1.0 NaN False False
20 2.0 NaN False False
21 NaN NaN NaN NaN
22 NaN NaN NaN NaN