sqlclickhouse

Normalize JSON array in Clickhouse


I have a Clickhouse table with following structure:

id json
111 [{"productId": "718f4d00-210d-43f1-9c9c-e97733d38972", "cost": "170.00000"}, {"productId": "df145ba0-ff68-4370-88e1-1c0ad246b1b7", "cost": "230.00000"}]
112 [{"productId": "718f4d00-210d-43f1-9c9c-e97733d38972", "cost": "170.00000"}]
113 [{"productId": "718f4d00-210d-43f1-9c9c-e97733d38972", "cost": 170}, {"productId": "df145ba0-ff68-4370-88e1-1c0ad246b1b7", "cost": 230}]

I need to normalize JSON column keeping its row id.

Expected result:

id productId cost
111 718f4d00-210d-43f1-9c9c-e97733d38972 170
111 df145ba0-ff68-4370-88e1-1c0ad246b1b7 230
112 718f4d00-210d-43f1-9c9c-e97733d38972 170
113 718f4d00-210d-43f1-9c9c-e97733d38972 170
113 df145ba0-ff68-4370-88e1-1c0ad246b1b7 230

I've tried basic JSON functions but they don't work with arrays like mine.

Is there a way to explode JSON row wise?


Solution

  • Extract JSON to Array of Tuple.
    Several number of "tuple" in array expand with "ARRAY JOIN arrayEnumerate".

    id jsonv
    111 [{"productId": "718f4d00-210d-43f1-9c9c-e97733d38972", "cost": "170.10000"}, {"productId": "df145ba0-ff68-4370-88e1-1c0ad246b1b7", "cost": "230.00000"}]
    112 [{"productId": "718f4d00-210d-43f1-9c9c-e97733d38972", "cost": "170.00000"}]
    113 [{"productId": "718f4d00-210d-43f1-9c9c-e97733d38972", "cost": 170}, {"productId": "df145ba0-ff68-4370-88e1-1c0ad246b1b7", "cost": 230}]
    select id,num, art[num].1 productId, art[num].2 cost
    from(
       select id ,JSONExtract(jsonv,'Array(Tuple(productId String, cost String))') art
       from test
    )a
    ARRAY JOIN arrayEnumerate(art) AS num;
    
    id num productId cost
    111 1 718f4d00-210d-43f1-9c9c-e97733d38972 170.10000
    111 2 df145ba0-ff68-4370-88e1-1c0ad246b1b7 230.00000
    112 1 718f4d00-210d-43f1-9c9c-e97733d38972 170.00000
    113 1 718f4d00-210d-43f1-9c9c-e97733d38972 170
    113 2 df145ba0-ff68-4370-88e1-1c0ad246b1b7 230

    Update1

    Array of tuple( "JSONExtract(jsonv,'Array(Tuple(productId String, cost String))')" )

    id a1
    111 [('718f4d00-210d-43f1-9c9c-e97733d38972','170.10000'),('df145ba0-ff68-4370-88e1-1c0ad246b1b7','230.00000')]
    112 [('718f4d00-210d-43f1-9c9c-e97733d38972','170.00000')]
    113 [('718f4d00-210d-43f1-9c9c-e97733d38972','170'),('df145ba0-ff68-4370-88e1-1c0ad246b1b7','230')]

    Example fiddle

    Short example with "ARRAY JOIN arrayEnumerate(a)"

    select num,a[num] an
    from(select ['A','B','C'] a) t
    ARRAY JOIN arrayEnumerate(a) as num
    

    Output is

    num an
    1 A
    2 B
    3 C