google-bigquerybigquery-udf

BigQuery extract nested JSON with dynamic keys and values


I would like to extract nested JSON with dynamic keys. I am currently able to extract keys with the aid of 1 and 2, but parsing the value of JSON object is producing unexpected results. "[object Object]"

What would be the proper way of parsing the nested JSON with dynamic keys and values? (I would prefer not to use a custom JS UDF but I am not sure if existing JSON functions can handle the problem.)

Input column of the record
{
    "key1":{"ItemID":1,"UseCount":4,"ItemCount":7},
    "key2":{"ItemID":2,"UseCount":5,"ItemCount":8},
    "key3":{"ItemID":3,"UseCount":6,"ItemCount":9}
    ...
}
Current Query

bigquery-utils: json_extract_keys() json_extract_values()

WITH
sample_logs AS (
    SELECT '{"key1":{"ItemID":1,"UseCount":4,"ItemCount":7},"key2":{"ItemID":2,"UseCount":5,"ItemCount":8},"key3":{"ItemID":3,"UseCount":6,"ItemCount":9}}' as json_string,
    UNION ALL SELECT '{"key4":{"ItemID":1,"UseCount":4,"ItemCount":7},"key5":{"ItemID":2,"UseCount":5,"ItemCount":8}}'
)
SELECT
    json_string,
    key,
    TO_JSON_STRING(value) as value,
FROM sample_logs
CROSS JOIN UNNEST(bqutil.fn.json_extract_keys(json_string)) as key WITH OFFSET
INNER JOIN UNNEST(bqutil.fn.json_extract_values(json_string)) as value WITH OFFSET USING (OFFSET)
;
Results

enter image description here

Expected Results
JSON_STRING1  |  "key1"   |  {"ItemID":1,"UseCount":4,"ItemCount":7}  -- <- not [object Object]
JSON_STRING1  |  "key2"   |  {"ItemID":2,"UseCount":5,"ItemCount":8}
JSON_STRING1  |  "key3"   |  {"ItemID":3,"UseCount":6,"ItemCount":9}
JSON_STRING2  |  "key4"   |  {"ItemID":1,"UseCount":4,"ItemCount":7}
JSON_STRING2  |  "key5"   |  {"ItemID":2,"UseCount":5,"ItemCount":8}

Solution

  • You can consider below.

    WITH
    sample_logs AS (
        SELECT '{"key1":{"ItemID":1,"UseCount":4,"ItemCount":7},"key2":{"ItemID":2,"UseCount":5,"ItemCount":8},"key3":{"ItemID":3,"UseCount":6,"ItemCount":9}}' as json_string,
        UNION ALL SELECT '{"key4":{"ItemID":1,"UseCount":4,"ItemCount":7},"key5":{"ItemID":2,"UseCount":5,"ItemCount":8}}'
    )
    SELECT json_string,
           key,
           PARSE_JSON(json_string)[key] value
      FROM sample_logs, 
    UNNEST(bqutil.fn.json_extract_keys(json_string)) as key;
    

    Query result

    enter image description here