arraysjsonobjectamazon-redshiftdbt

Extract JSON array from object (redshift)


I need to extract data from an array nested inside an object (let's say the column it lives in is metadata). It looks like the following:

{
    "name": "abc",
    "references": "abc",
    "owners": [
        "xyz", "abc"
    ],
    "groups": ["x", "x"]
}

Sometimes the metadata field could look like this:

{
    "name": "abc",
    "references": "abc",
    "owners": [
        "xyz", "abc"
    ],
    "groups": ["x"] -- one less value
}

Or this:

{
    "name": "abc",
    "references": "abc",
    "sources": "abc", -- new column
    "owners": [
        "xyz", "abc"
    ],
    "groups": ["x", "x"]
}

It would be really helpful if someone could give me some advice! Ideally I would unpivot the keys into their own columns and the values would be data within those columns, as rows. But I'm struggling to figure out how to parse it first

I've tried:

json_extract_path_text(metadata, 'groups') -- returns ["x", "x"]
JSON_EXTRACT_PATH_TEXT(JSON_EXTRACT_ARRAY_ELEMENT_TEXT(model_meta::VARCHAR, 0), 'groups'::VARCHAR) -- returns an error: JSON parsing error, 8001, nvalid json array object {}

Solution

  • You got your calls backwards:

    JSON_EXTRACT_ARRAY_ELEMENT_TEXT(JSON_EXTRACT_PATH_TEXT((metadata, 'groups'), 0)
    

    This code may give you ideas on a better way to do this:

    with inputdata as (
        select json_parse('{"name": "abc", "references": "abc", "owners": ["xyz", "abc"], "groups": ["x", "y"]}') as metadata
    )
    select i.metadata.name, i.metadata.references, o, g
    from inputdata i, i.metadata.owners o, i.metadata.groups g