arraysjsonamazon-redshiftnested-jsonredshift-query

How to Extract Nested JSON Array in Redshift and Create a Table with Specific Columns?


I have a table in AWS Redshift with a column called json, which stores a nested JSON object as a string. I need to create a new table with three columns: sid, skill_name, and skill_vdd. The json column contains a nested JSON array under the key rows, and I want to extract the values from this array.

Here is a sample JSON in the json column:

{
    "Id": 888,
    "Command": "uuuu",
    "step": "rows": [
        {"sid": 1515, "skill_name": "jjjj", "skill_vdd": 9999},
        {"sid": 333, "skill_name": "ttt", "skill_vdd": 333}
    ]
}

I've tried using various redshift json functions, but I’m not able to flatten the json

How can I flatten this nested JSON and create a table with the following columns: sid, skill_name, skill_vdd from the json column in Redshift?


Solution

  • I'm going to assume that all entries for this column don't just have 2 values in their array "rows". In this case you will want to json_parse() this data into a SUPER data type and then "unnest" these array values. AWS has a good page on unnesting here - https://docs.aws.amazon.com/redshift/latest/dg/query-super.html

    This question comes up a lot and I've answered it before here - UNNEST Super data in redshift

    If you need help with getting this to work reach out again and show what code you have so far.