pythonpandasdataframepattern-matchingpandas-rolling

filter for specific sequences involving multiple columns and surrounding rows


I have data that looks like this:

enter image description here

It's standard financial price data (open, high, low, close).

In addition, I run some calculations. 'major_check' occasionally returns 1 or 2 (which 'minor_check' will then also return). 'minor_check' also returns 1 or 2, but more frequently. the rest is filled with 0 or NaN.

I'd like to test for specific patterns:

I highlighted a 21212 pattern in the screenshot to give a better idea on what I am looking for.

Once the 21212 or 12121 patterns are found, I'll check if specific rules applied on open/high/low/close (corresponding to the 5 rows constituting the pattern) are met or not.

Of course, one could naively iterate through the dataframe but this doesn't sound like the Pythonic way to do it.

I didn't manage to find a good way to do this, since a 21212 pattern can have some 0s inside it


Solution

  • As this answer by Timeless looked surprisingly complex, here is a quite simpler one.

    Method:

    You haven't specified how to flag the findings. Here they get marked as a True in two new columns, one for each pattern, appended to the original dataframe, for whatever use you would plan for them. They are called "hit1" and "hit2".

    Input data
    No text input data in your post, so until then I came up with my own. It is designed to produce one hit for each pattern:

    import pandas as pd
    import numpy as np
    
    # Start dataframe
    df = pd.DataFrame({'minor': [None,2,   1,   None,None,2,   None,2,   None,None,None,
                                 1,   2,   None,1,   None,None,2,   None,1,   2,   None,None],
                       'major': [None,None,None,None,None,2,   None,None,None,None,None,
                                 None,2,   None,1,   None,None,None,   None,None,None,None,None]})
    df
    
        minor  major
    0     NaN    NaN
    1     2.0    NaN
    2     1.0    NaN
    3     NaN    NaN
    4     NaN    NaN
    5     2.0    2.0
    6     NaN    NaN
    7     2.0    NaN
    8     NaN    NaN
    9     NaN    NaN
    10    NaN    NaN
    11    1.0    NaN
    12    2.0    2.0
    13    NaN    NaN
    14    1.0    1.0
    15    NaN    NaN
    16    NaN    NaN
    17    2.0    NaN
    18    NaN    NaN
    19    1.0    NaN
    20    2.0    NaN
    21    NaN    NaN
    22    NaN    NaN
    

    Locate hits
    Skip rows without test results

    # Remove rows without minor test result
    df1 = df.dropna(axis=0,subset='minor').copy()
    # No reset_index because we'll use it to report back to df.
    
    # Patterns of 'minor test' 
    minor_pat1 = [1,2,1,2,1]
    minor_pat2 = [2,1,2,1,2]
    

    Pattern search:

    # Deploy shift columns, looking 2 values backwards and 2 forward
    for i in range(-2,3):              # i in [-2,-1, 0, 1, 2]
        df1[i] = df1['minor'].shift(i) # create a column named i
    
    # Test for both patterns and major test value
    df1['hit1'] = np.where(df1['major']==1, # case 12121
                           df1[list(range(-2,3))].eq(minor_pat1).all(axis=1),
                           False)
    df1['hit2'] = np.where(df1['major']==2, # case 21212
                           df1[list(range(-2,3))].eq(minor_pat2).all(axis=1),
                           False)
    df1 # Temporary dataframe with findings located
    
        minor  major   -2   -1    0    1    2   hit1   hit2
    1     2.0    NaN  2.0  1.0  2.0  NaN  NaN  False  False
    2     1.0    NaN  2.0  2.0  1.0  2.0  NaN  False  False
    5     2.0    2.0  1.0  2.0  2.0  1.0  2.0  False  False
    7     2.0    NaN  2.0  1.0  2.0  2.0  1.0  False  False
    11    1.0    NaN  1.0  2.0  1.0  2.0  2.0  False  False
    12    2.0    2.0  2.0  1.0  2.0  1.0  2.0  False   True
    14    1.0    1.0  1.0  2.0  1.0  2.0  1.0   True  False
    17    2.0    NaN  2.0  1.0  2.0  1.0  2.0  False  False
    19    1.0    NaN  NaN  2.0  1.0  2.0  1.0  False  False
    20    2.0    NaN  NaN  NaN  2.0  1.0  2.0  False  False
    

    .rolling() was designed for that purpose exactly.
    Just too bad they haven't implemented .rolling().eq() yet (list of window functions).
    This is why we must resort to apply .eq() from inside a lambda function.

    df1['hit1'] = (df1['major']==1) & (df1['minor']
                                       .rolling(window=5, center=True)
                                       .apply(lambda x : x.eq(minor_pat1).all()))
    
    df1['hit2'] = (df1['major']==2) & (df1['minor']
                                       .rolling(window=5, center=True)
                                       .apply(lambda x : x.eq(minor_pat2).all()))
    
        minor  major   hit1   hit2
    1     2.0    NaN  False  False
    2     1.0    NaN  False  False
    5     2.0    2.0  False  False
    7     2.0    NaN  False  False
    11    1.0    NaN  False  False
    12    2.0    2.0  False   True
    14    1.0    1.0   True  False
    17    2.0    NaN  False  False
    19    1.0    NaN  False  False
    20    2.0    NaN  False  False
    

    Finally report back to original df

    df.loc[df1.index,'hit1'] = df1['hit1']
    df.loc[df1.index,'hit2'] = df1['hit2']
    df
    
        minor  major   hit1   hit2
    0     NaN    NaN    NaN    NaN
    1     2.0    NaN  False  False
    2     1.0    NaN  False  False
    3     NaN    NaN    NaN    NaN
    4     NaN    NaN    NaN    NaN
    5     2.0    2.0  False  False
    6     NaN    NaN    NaN    NaN
    7     2.0    NaN  False  False
    8     NaN    NaN    NaN    NaN
    9     NaN    NaN    NaN    NaN
    10    NaN    NaN    NaN    NaN
    11    1.0    NaN  False  False
    12    2.0    2.0  False   True
    13    NaN    NaN    NaN    NaN
    14    1.0    1.0   True  False
    15    NaN    NaN    NaN    NaN
    16    NaN    NaN    NaN    NaN
    17    2.0    NaN  False  False
    18    NaN    NaN    NaN    NaN
    19    1.0    NaN  False  False
    20    2.0    NaN  False  False
    21    NaN    NaN    NaN    NaN
    22    NaN    NaN    NaN    NaN