sqlamazon-redshift

Redshift | JSON Flattening and storing it in a table


I have a JSON value

{
    "test": {        
        "userId": 77777,
        
        "sectionScores": [
            {
                "id": 2,
                "score": 244,
             
            },
            {
                "id": 1,
                "score": 212
                
            }
        ]
    }
}

NOTE : the order of sectionScores vary.

id 1 represent sectionname1 and id 2 represent sectionname2 and id 3 represent sectionname3

Maximum 4 sections it may return. It may also return 2 sections.

i have a redshift table

create table stage.poc1(
user_id bigint ,
json_data super
)

from where i write the flatten query and insert into the target table

select user_id, json_data.test[0] ... I dont know how to write

create table target.poc
{
user_id bigint,
sectionname1_score int,
sectionname2_score int,
sectionname3_score int,
sectionname4_score int
}
insert into target.poc

select user_id, json_data.test[0].score as sectionname1_score

... I dont know how to place the value dynamically

thanks in advance


Solution

  • Made a few assumptions but this should get you started.

    First note that when working with supers that have uppercase letters you need to enable case sensitivity in Redshift and double quote any names that have uppercase letters.

    The easiest way to work with this is to directly convert your json strings to super. Then unnest the arrays into additional rows - see: https://docs.aws.amazon.com/redshift/latest/dg/query-super.html

    Below I have a sample SQL that will creates some test data in a CTE (with clause) and then unnests it and splits the values out into the columns you want. Group by and SUM() then aggregates the rows down to the desired result.

    SET enable_case_sensitive_identifier TO true;
    
    with data as (
    select
      json_parse('{
        "test": {        
            "userId": 55555,        
            "sectionScores": [
                {
                    "id": 2,
                    "score": 544
                },
                {
                    "id": 1,
                    "score": 512
                    
                },
                {
                    "id": 3,
                    "score": 544
                 
                },
                {
                    "id": 4,
                    "score": 412
                    
                }
            ]
        }
    }') sp
    union all
    select
      json_parse('{
        "test": {        
            "userId": 77777,
            "sectionScores": [
                {
                    "id": 2,
                    "score": 244
                 
                },
                {
                    "id": 1,
                    "score": 212
                    
                }
            ]
        }
    }') sp
    union all
    select
      json_parse('{
        "test": {        
            "userId": 66666,
            "sectionScores": [
                {
                    "id": 2,
                    "score": 644             
                },
                {
                    "id": 1,
                    "score": 612
                    
                },
                {
                    "id": 3,
                    "score": 612
                    
                }
            ]
        }
    }') sp
    )
    select d.sp.test."userId" as user_id, 
      sum(decode(ss.id,1,ss.score)) as sectionname1_score,
      sum(decode(ss.id,2,ss.score)) as sectionname2_score,
      sum(decode(ss.id,3,ss.score)) as sectionname3_score,
      sum(decode(ss.id,4,ss.score)) as sectionname4_score
    from data d, d.sp.test."sectionScores" ss
    group by user_id
    order by user_id;
    

    The exact table definitions you have aren't completely clear but from here you should be able to make it look as you need.