sqljsonsnowflake-cloud-data-platformflatten

How to extract data as columns from json format string in snowflake where each record has different values in json


A sample of the data is as below:

enter image description here

the properties column is a varchar column in the table. It can have any combination of name & value properties. It can also be null for a lot of records. I want the output to be as follows - all the properties within every json to be a new column:

enter image description here

How can this be achieved in snowflake?

Additional details: The properties column has json formats within list,

[
  {
    "name": "_is-bundle",
    "value": "Yes"
  },
  {
    "name": "_bundle-id",
    "value": "xyz-abc"
  }
]

I use an ETL tool but I could not achieve what I wanted. I tried the lateral flatten method on snowflake but it throws an error for the properties column being varchar.


Solution

  • One option is to first flatten the JSON and then use case statements to identify when certain values belong in certain columns. (Assuming you already have a table with the JSON values in it, you can just skip the CTE in this query I've created below.)

    WITH JSON_DATA AS (SELECT (PARSE_JSON('[
            {"name": "_is-bundle", "value": "Yes"},
            {"name": "_bundle-id", "value": "xyz-abc"}
        ]')) AS json_data
                   UNION ALL
                   SELECT (PARSE_JSON('[
            {"name": "_is-bundle", "value": "Yes"},
            {"name": "_bundle-id", "value": "fig-gif"}
        ]'))
                   UNION ALL
                   SELECT (PARSE_JSON('[
            {"name": "_gift-grp", "value": "axx25"},
            {"name": "_is-free", "value": "Yes"}
        ]'))
                   UNION ALL
                   SELECT (PARSE_JSON('[
            {"name": "_group-child", "value": "Yes"},
            {"name": "_group-id", "value": "100nj"},
            {"name": "_child-id", "value": "600nj"}
        ]')))
    SELECT
       CASE WHEN flattened.value:name = '_is-bundle'
           THEN flattened.value:value END   AS is_bundle,
       CASE WHEN flattened.value:name = '_bundle-id'
           THEN flattened.value:value END   AS bundle_id,
       CASE WHEN flattened.value:name = '_gift-grp'
           THEN flattened.value:value END    AS gift_grp,
       CASE WHEN flattened.value:name = '_is-free'
           THEN flattened.value:value END     AS is_free,
       CASE WHEN flattened.value:name = '_group-child'
           THEN flattened.value:value END AS group_child,
       CASE WHEN flattened.value:name = '_group-id'
           THEN flattened.value:value END    AS group_id,
       CASE WHEN flattened.value:name = '_child-id'
           THEN flattened.value:value END    AS child_id
    FROM JSON_DATA,
     LATERAL FLATTEN(input => JSON_DATA.json_data) AS FLATTENED;