I'm trying to find sessions that contain 3 specific events, they need to be ordered, meaning that event_1 happens first, then event_2, then event_3, but they don't need to be concentrated exactly one after another. Instead, any number of other random events can be in between them. How do I define a pattern in the match_recognize clause to allow me to mark these events with the classifier statement, and also mark them in case the sequence is incomplete, if for example only event_1 happens, or if event_1 + event_2 happens?
Or is there any other way to do this that is more efficient and doesn't involve match_recognize? I'm trying to avoid multiple joins because data is huge.
Here's a dummy query for presentation:
select
session_id,
event,
event_dttm
from events
match_recognize (
partition by session_id
order by event_dttm
measures
classifier as var
all rows per match with unmatched rows
pattern (???answer needed???)
define
event_1 as event = 'Click image',
event_2 as event = 'Open profile',
event_3 as event = 'Leave review');
You can put another event in that is the "but not those others" and then have 0-many matches on that between:
with this data:
with events(session_id, event, event_dttm) as (
SELECT * FROM VALUES
(99, 0, 10)
,(99, 1, 11)
,(99, 2, 12)
,(99, 3, 13)
,(98, 1, 10)
,(98, 2, 11)
,(98, 3, 12)
,(98, 0, 13)
,(100, 1, 10)
,(100, 2, 11)
,(100, 3, 12)
,(101, 1, 10)
,(101, 0, 11)
,(101, 2, 12)
,(101, 3, 13)
,(102, 1, 10)
,(102, 0, 11)
,(102, 0, 12)
,(102, 2, 13)
,(102, 3, 14)
,(103, 1, 10)
,(103, 0, 11)
,(103, 2, 12)
,(103, 0, 13)
,(103, 3, 14)
,(104, 1, 10)
,(104, 0, 11)
,(104, 2, 12)
,(104, 0, 13)
/* incomplete ,(104, 3, 14) */
)
select
*
from events
match_recognize (
partition by session_id
order by event_dttm
measures
classifier as var
all rows per match with unmatched rows
pattern (e1 ex* e2 ex* e3)
define
e1 as event = 1,
e2 as event = 2,
e3 as event = 3,
ex as event not in (1,2,3))
ORDER BY 1,3;
gives:
SESSION_ID | EVENT | EVENT_DTTM | VAR |
---|---|---|---|
98 | 1 | 10 | E1 |
98 | 2 | 11 | E2 |
98 | 3 | 12 | E3 |
98 | 0 | 13 | |
99 | 0 | 10 | |
99 | 1 | 11 | E1 |
99 | 2 | 12 | E2 |
99 | 3 | 13 | E3 |
100 | 1 | 10 | E1 |
100 | 2 | 11 | E2 |
100 | 3 | 12 | E3 |
101 | 1 | 10 | E1 |
101 | 0 | 11 | EX |
101 | 2 | 12 | E2 |
101 | 3 | 13 | E3 |
102 | 1 | 10 | E1 |
102 | 0 | 11 | EX |
102 | 0 | 12 | EX |
102 | 2 | 13 | E2 |
102 | 3 | 14 | E3 |
103 | 1 | 10 | E1 |
103 | 0 | 11 | EX |
103 | 2 | 12 | E2 |
103 | 0 | 13 | EX |
103 | 3 | 14 | E3 |
104 | 1 | 10 | |
104 | 0 | 11 | |
104 | 2 | 12 | |
104 | 0 | 13 |