I have a tricky query design requirement, i have tried different types/different combination of Analytical function to achieve my result from the below data set. My other plan is do write stored proc, however i want to reach out to the expert group before i change my direction.
Input Data Set:
Required Output Data Set with the Group Column: When there is a session id change in the session id and if i get back the same session id again, i have to have a different group to it. I tried to use LEAD/LAG combination, however unable to get the below desired output, one or other scenario was breaking.
Thanks !
SQL language is expressive enough to find a declarative solution for complex requirements.
Snowflake has recently implemented SQL 2016 Standard clause: MATCH_RECOGNIZE, which was designed to solve such cases in very straighforward way.
Identifying Sequences of Rows That Match a Pattern
In some cases, you might need to identify sequences of table rows that match a pattern. For example, you might need to:
Determine which users followed a specific sequence of pages and actions on your website before opening a support ticket or making a purchase.
Find the stocks with prices that followed a V-shaped or W-shaped recovery over a period of time.
Look for patterns in sensor data that might indicate an upcoming system failure.
Data preparation:
CREATE OR REPLACE TABLE t
AS
SELECT 101 SESS_ID, 1 POL_ID, '2021-04-17 09:30:00'::DATE AS Trans_dt, 1 AS VERSION_ID
UNION ALL SELECT 101 SESS_ID, 1 POL_ID, '2021-04-17 09:35:00'::DATE AS Trans_dt, 2
UNION ALL SELECT 102 SESS_ID, 1 POL_ID, '2021-04-17 09:37:00'::DATE AS Trans_dt, 3
UNION ALL SELECT 102 SESS_ID, 1 POL_ID, '2021-04-17 09:38:00'::DATE AS Trans_dt, 4
UNION ALL SELECT 101 SESS_ID, 1 POL_ID, '2021-04-17 09:39:00'::DATE AS Trans_dt, 5
UNION ALL SELECT 101 SESS_ID, 1 POL_ID, '2021-04-17 09:40:00'::DATE AS Trans_dt, 6;
Query:
SELECT *
FROM t
MATCH_RECOGNIZE (
PARTITION BY POL_ID
ORDER BY VERSION_ID
MEASURES MATCH_NUMBER() AS group_id
--,CLASSIFIER() as cks
ALL ROWS PER MATCH
PATTERN (a+b*)
DEFINE a as sess_id = FIRST_VALUE(sess_id)
,b AS sess_id != FIRST_VALUE(sess_id)
) mr
ORDER BY POL_ID, VERSION_ID;
Output:
SESS_ID POL_ID TRANS_DT VERSION_ID GROUP_ID
101 1 2021-04-17 1 1
101 1 2021-04-17 2 1
102 1 2021-04-17 3 1
102 1 2021-04-17 4 1
101 1 2021-04-17 5 2
101 1 2021-04-17 6 2
How it works:
(a+b*)
which is Perl-style regexp, a(one or more) b(zero or more)MATCH_NUMBER()
-"Returns the sequential number of the match"POL_ID
and use VERSION_ID
as sorting column