sqlamazon-athenaprestotrino

How to separate JSON key and value pair in Amazon Athena?


I need help on separating JSON key and value pair. I've spent hours of googling but haven't really found a way to do this. I'm hoping to find some answers here.

So, I have a dataset here which is composed of department(string) and assets(stringified JSON) . I was able to parse the assets into a JSON and what I want to do next is to separate its key and its value (which I illustrated below).

I've tried the UNNEST but it doesn't seem to support JSON value.

How can I achieve this?

Athena Query:

WITH dataset AS 
    (SELECT 'engineering' AS department, '{"number_of_assets": {"computer": "95"}}' AS assets )
SELECT *
FROM 
    (SELECT department,
         json_extract(assets,
         '$.number_of_assets') AS number_of_assets
    FROM dataset)

Current result: enter image description here

Desired result: enter image description here


Solution

  • You can cast your extracted json to MAP(VARCHAR, VARCHAR) (or MAP(VARCHAR, INTEGER)) and unnest the result:

    WITH dataset AS (
        SELECT 'engineering' AS department,
            '{"number_of_assets": {"computer": "95"}}' AS assets
    )
    
    SELECT department, asset, count
    FROM (
            SELECT department,
                cast(json_extract(assets, '$.number_of_assets') as MAP(VARCHAR, VARCHAR)) AS number_of_assets
            FROM dataset
        ) t
    CROSS JOIN UNNEST(number_of_assets) AS t (asset, count);
    

    Output:

    department asset count
    engineering computer 95