google-analyticssnowflake-cloud-data-platformflattenlateral

How to parse nested json in snowflake using lateral flatten


WOuld like to extract values from event_params attributes, I would like to get values from event_params."key": "link_class":string_value Ex: event_params--> key=link_class and get "ABC-button loginblue" I need column name with link_class with "ABC-button loginblue" value in it

{
  "app_info": null,
  "collected_traffic_source": null,
  "device": {
    "advertising_id": null,
    "web_info": {
      "browser": "Edge",
      "browser_version": "116.0.1938.76",
      "hostname": "www.ABC.com"
    }
  },
  "ecommerce": null,

  "event_params": [
    {
      "key": "link_class",
      "value": {
        "double_value": null,
        "float_value": null,
        "int_value": null,
        "string_value": "ABC-button loginblue"
      }
    },
    {
      "key": "page_title",
      "value": {
        "double_value": null,
        "float_value": null,
        "int_value": null,
        "string_value": "Welcome  Forward | XYZ Bank"
      }
    }
  ],
  "user_pseudo_id": "899702629.1663712233"
}
select raw,
raw:device.web_info.hostname::string as hostname,
ev.value as event_params
from ANALYTICS_323116990 t ,
lateral flatten (input => t.raw, path =>'event_params') ev

Solution

  • It looks like you're looking at GA4's schema on Snowflake?

    I wrote a cookbook for that:

    My advice for this schema is to transform that "inconvenient" structure (forced by BigQuery's struct) to something easier to parse.

    A JS UDF can help you do this in real-time, or to transform the data into a new table that makes it easier to parse:

    create or replace function parse_ga4_objarray_fixed(V variant)
    returns variant
    language javascript
    AS
    $$
    var result = {};
    for (const x of V) {
      if (x.value){
        for (const [key, value] of Object.entries(x.value)) {
          if ( key ! = 'set_timestamp_micros') {
            result[x.key] = value;
          }
        }
      } else {
        result[x.key] = null;
      }
    }
    return result
    $$;
    

    With that transformation you can query the data in a more natural way:

    SELECT event_timestamp, event_params:value event_value
    FROM ga4
    WHERE event_name = 'purchase'
    AND event_timestamp BETWEEN '2020-12-01' AND '2020-12-02';