sqlsnowflake-cloud-data-platformanalytical

Snowflake Analytical Query Design


I have a tricky query design requirement, i have tried different types/different combination of Analytical function to achieve my result from the below data set. My other plan is do write stored proc, however i want to reach out to the expert group before i change my direction.

Input Data Set:

enter image description here

Required Output Data Set with the Group Column: When there is a session id change in the session id and if i get back the same session id again, i have to have a different group to it. I tried to use LEAD/LAG combination, however unable to get the below desired output, one or other scenario was breaking.

enter image description here

Thanks !


Solution

  • SQL language is expressive enough to find a declarative solution for complex requirements.

    Snowflake has recently implemented SQL 2016 Standard clause: MATCH_RECOGNIZE, which was designed to solve such cases in very straighforward way.

    Identifying Sequences of Rows That Match a Pattern

    In some cases, you might need to identify sequences of table rows that match a pattern. For example, you might need to:

    • Determine which users followed a specific sequence of pages and actions on your website before opening a support ticket or making a purchase.

    • Find the stocks with prices that followed a V-shaped or W-shaped recovery over a period of time.

    • Look for patterns in sensor data that might indicate an upcoming system failure.

    Data preparation:

    CREATE OR REPLACE TABLE t
    AS
    SELECT 101 SESS_ID, 1 POL_ID, '2021-04-17 09:30:00'::DATE AS Trans_dt, 1 AS VERSION_ID
    UNION ALL SELECT 101 SESS_ID, 1 POL_ID, '2021-04-17 09:35:00'::DATE AS Trans_dt, 2
    UNION ALL SELECT 102 SESS_ID, 1 POL_ID, '2021-04-17 09:37:00'::DATE AS Trans_dt, 3
    UNION ALL SELECT 102 SESS_ID, 1 POL_ID, '2021-04-17 09:38:00'::DATE AS Trans_dt, 4
    UNION ALL SELECT 101 SESS_ID, 1 POL_ID, '2021-04-17 09:39:00'::DATE AS Trans_dt, 5
    UNION ALL SELECT 101 SESS_ID, 1 POL_ID, '2021-04-17 09:40:00'::DATE AS Trans_dt, 6;
    

    Query:

    SELECT *
    FROM t
    MATCH_RECOGNIZE (
        PARTITION BY POL_ID
        ORDER BY VERSION_ID
        MEASURES MATCH_NUMBER() AS group_id
                --,CLASSIFIER() as cks
        ALL ROWS PER MATCH
        PATTERN (a+b*)
        DEFINE a as sess_id = FIRST_VALUE(sess_id)
              ,b AS sess_id != FIRST_VALUE(sess_id)
    ) mr
    ORDER BY POL_ID, VERSION_ID;
    

    db<>fiddle demo(Oracle)

    Output:

    SESS_ID POL_ID  TRANS_DT    VERSION_ID  GROUP_ID
    101 1   2021-04-17  1   1
    101 1   2021-04-17  2   1
    102 1   2021-04-17  3   1
    102 1   2021-04-17  4   1
    101 1   2021-04-17  5   2
    101 1   2021-04-17  6   2
    

    How it works:

    1. Define a pattern:(a+b*) which is Perl-style regexp, a(one or more) b(zero or more)
    2. Define pattern components a(sess_id is the same as first element of group), b(sess_id is not the same as first element of group)
    3. Define measure MATCH_NUMBER() -"Returns the sequential number of the match"
    4. Perform this operation per each POL_ID and use VERSION_ID as sorting column