google-bigquerysnowflake-cloud-data-platform

Translate Google BigQuery to Snowflake SQL


I'm translating a BigQuery below to Snowflake.

select event_date,user_pseudo_id,event_name, event_timestamp,
(SELECT COALESCE( value.string_value, CAST(value.int_value AS string), CAST(value.double_value AS string), CAST(value.float_value AS string) ) FROM UNNEST (event_params)   WHERE    KEY = 'key1' ) AS key1, 
(SELECT COALESCE( value.string_value, CAST(value.int_value AS string), CAST(value.double_value AS string), CAST(value.float_value AS string) ) FROM UNNEST (event_params)   WHERE    KEY = 'key2' ) AS key2 
FROM `bigquery.analytics.events_20240812`;

I expect the result as below. The way I'm doing in Snowflake is creating 2 separate tables for key1 and key2, then full join them together. However, I will have hundreds of keys in the future. It would difficult to accomplish the goal with hundreds of joins.

Thank you so much!

Below is what I got from BigQuery and would expect the same for Snowflake.

enter image description here


Solution

  • If your events look like:

    with events(event_date, user_pseudo_id, event_timestamp,event_params) as (
        select $1, $2, $3, parse_json($4) from values
        ('2024-08-12'::date, '111.11', 123456, '[
    {"key":"key0", "float_value":-1.234},{"key":"key1", "string_value":"Prifilled"},{"key":"key2", "string_value":"[UUID]"}
    ]'),
        ('2024-08-12'::date, '111.11', 123456, '[
    {"key":"key0", "float_value":4.567},{"key":"key1", "string_value":"Prifilled"},{"key":"key2", "string_value":"ot-group-id-COOO4"}
    ]')
    )
    

    which is too say, a row of values, and some JSON values, then FLATTEN is the most related to UNNEST command:

    select
        e.event_date,
        e.user_pseudo_id,
        e.event_timestamp,
        f.value:key as k,
        COALESCE( f.value:string_value, f.value:int_value::text, f.value:double_value::text, f.value:float_value::text) as v,
        f.seq 
    from events as e,
    table(flatten(input=>e.event_params)) as f
    where f.value:key in ('key1', 'key2')
    

    gives:

    enter image description here

    which grouping via the SEQ and using MAX and IFF to pivot the values like:

    select 
        event_date, 
        user_pseudo_id, 
        event_timestamp,
        min(iff(k='key1', v, null)) as key1,
        min(iff(k='key2', v, null)) as key2
    from (
        select
            e.event_date,
            e.user_pseudo_id,
            e.event_timestamp,
            f.value:key as k,
            COALESCE( f.value:string_value, f.value:int_value::text, f.value:double_value::text, f.value:float_value::text) as v,
            f.seq 
        from events as e,
        table(flatten(input=>e.event_params)) as f
        where f.value:key in ('key1', 'key2')
    )
    group by 1,2,3,seq
    

    gives:

    enter image description here

    which appears to match your impled data and desired result.