pattern-matchingsnowflake-cloud-data-platformmatch-recognize

How to define a pattern with match_recognize to find ordered events that aren't consecutive?


I'm trying to find sessions that contain 3 specific events, they need to be ordered, meaning that event_1 happens first, then event_2, then event_3, but they don't need to be concentrated exactly one after another. Instead, any number of other random events can be in between them. How do I define a pattern in the match_recognize clause to allow me to mark these events with the classifier statement, and also mark them in case the sequence is incomplete, if for example only event_1 happens, or if event_1 + event_2 happens?

Or is there any other way to do this that is more efficient and doesn't involve match_recognize? I'm trying to avoid multiple joins because data is huge.

Here's a dummy query for presentation:

select 
    session_id,
    event,
    event_dttm
from events
match_recognize (
    partition by session_id
    order by event_dttm
    measures
        classifier as var
    all rows per match with unmatched rows
    pattern (???answer needed???)
    define
        event_1 as event = 'Click image',
        event_2 as event = 'Open profile',
        event_3 as event = 'Leave review');

Solution

  • You can put another event in that is the "but not those others" and then have 0-many matches on that between:

    with this data:

    with events(session_id, event, event_dttm) as (
        SELECT * FROM VALUES
        (99, 0, 10)
        ,(99, 1, 11)
        ,(99, 2, 12)
        ,(99, 3, 13)
        
        ,(98, 1, 10)
        ,(98, 2, 11)
        ,(98, 3, 12)
        ,(98, 0, 13)
        
        ,(100, 1, 10)
        ,(100, 2, 11)
        ,(100, 3, 12)
        
        ,(101, 1, 10)
        ,(101, 0, 11)
        ,(101, 2, 12)
        ,(101, 3, 13)
    
        ,(102, 1, 10)
        ,(102, 0, 11)
        ,(102, 0, 12)
        ,(102, 2, 13)
        ,(102, 3, 14)
    
        ,(103, 1, 10)
        ,(103, 0, 11)
        ,(103, 2, 12)
        ,(103, 0, 13)
        ,(103, 3, 14)
        
        ,(104, 1, 10)
        ,(104, 0, 11)
        ,(104, 2, 12)
        ,(104, 0, 13)
        /* incomplete ,(104, 3, 14) */
    )
    
    select 
        *
    from events
    match_recognize (
        partition by session_id
        order by event_dttm
        measures
            classifier as var
        all rows per match with unmatched rows
        pattern (e1 ex* e2 ex* e3)
        define
            e1 as event = 1,
            e2 as event = 2,
            e3 as event = 3,
            ex as event not in (1,2,3))
    ORDER BY 1,3;
    

    gives:

    SESSION_ID EVENT EVENT_DTTM VAR
    98 1 10 E1
    98 2 11 E2
    98 3 12 E3
    98 0 13
    99 0 10
    99 1 11 E1
    99 2 12 E2
    99 3 13 E3
    100 1 10 E1
    100 2 11 E2
    100 3 12 E3
    101 1 10 E1
    101 0 11 EX
    101 2 12 E2
    101 3 13 E3
    102 1 10 E1
    102 0 11 EX
    102 0 12 EX
    102 2 13 E2
    102 3 14 E3
    103 1 10 E1
    103 0 11 EX
    103 2 12 E2
    103 0 13 EX
    103 3 14 E3
    104 1 10
    104 0 11
    104 2 12
    104 0 13