sqlamazon-athenaprestotrinojson-query

"INVALID_FUNCTION_ARGUMENT: Cannot unnest type: varchar" in Athena JSON


I have a requirement to print the following output:

matchId rainyDate sunnyDate
1 2024-09-04 2024-09-04
1 2024-09-11 2024-09-12
1 2024-09-18 2024-09-19
2 2024-10-04 2024-10-04
2 2024-10-11
3

There is a list of matches, which can be Practice or Match and I'm only interested in non-Practice matches. Each match will have an array of a pair of dates, rainyDate and sunnyDate.

The following SQL fails with the error:

"INVALID_FUNCTION_ARGUMENT: Cannot unnest type: varchar":

WITH matches AS (SELECT 1 AS matchId, '
{
  "matchDetail": [
    {
      "matchType": "Practice",
      "matchDate": [
        {
          "rainyDate": "2024-09-01",
          "sunnyDate": "2024-09-02"
        },
        {
          "rainyDate": "2024-09-07",
          "sunnyDate": "2024-09-07"
        }
      ]
    },
    {
      "matchType": "Match",
      "matchDate": [
        {
          "rainyDate": "2024-09-04",
          "sunnyDate": "2024-09-04"
        },
        {
          "rainyDate": "2024-09-11",
          "sunnyDate": "2024-09-12"
        },
        {
          "rainyDate": "2024-09-18",
          "sunnyDate": "2024-09-19"
        }
      ]
    }
  ]
}' AS matchDetails
UNION
SELECT 2, '
{
  "matchDetail": [
    {
      "matchType": "Match",
      "matchDate": [
        {
          "rainyDate": "2024-10-04",
          "sunnyDate": "2024-10-04"
        },
        {
          "rainyDate": "2024-10-11"
        }
      ]
    }
  ]
}'
UNION
SELECT 3, '
{
  "matchDetail": [
    {
      "matchType": "Match"
    }
  ]
}'
)
SELECT
matchId
,t.rainyDate
,t.sunnyDate
FROM matches CROSS JOIN UNNEST(json_array(json_query(matchDetails, 'lax $."matchDetail"[*]?(@."matchType"=="Match")."matchDate"'))) AS t(matchdates)
ORDER BY matchId

The TYPEOF function returns a varchar despite adding the function json_array.

And then casting the above to JSON:

UNNEST(CAST(json_array(json_query(matchDetails, 'lax $."matchDetail"[*]?(@."matchType"=="Match")."matchDate"')) AS JSON)) AS t(matchdates)

gives an error:

INVALID_FUNCTION_ARGUMENT: Cannot unnest type: json

I am unable determine what's the shortcoming and will be grateful for any help.


Solution

  • As docs for json_array state:

    The SQL standard imposes that there is no dedicated data type to represent JSON data in SQL. Instead, JSON data is represented as character or binary strings. By default, the json_array function returns varchar containing the textual representation of the JSON array.

    So json_array results in varchar. If you want to unnest the data you can leverage the JSON type at the moment available in Trino and cast it to array of some type. For example via array(ROW):

    cast(
        json_parse(json_query(matchDetails, 'lax $."matchDetail"[*]?(@."matchType"=="Match")."matchDate"')) 
           AS array(row(rainyDate varchar, sunnyDate varchar))
    )
    

    With full query looking like:

    -- sample data
    WITH matches(matchId, matchDetails) AS (values
        (1, '
        {
          "matchDetail": [
            {
              "matchType": "Practice",
              "matchDate": [
                {
                  "rainyDate": "2024-09-01",
                  "sunnyDate": "2024-09-02"
                },
                {
                  "rainyDate": "2024-09-07",
                  "sunnyDate": "2024-09-07"
                }
              ]
            },
            {
              "matchType": "Match",
              "matchDate": [
                {
                  "rainyDate": "2024-09-04",
                  "sunnyDate": "2024-09-04"
                },
                {
                  "rainyDate": "2024-09-11",
                  "sunnyDate": "2024-09-12"
                },
                {
                  "rainyDate": "2024-09-18",
                  "sunnyDate": "2024-09-19"
                }
              ]
            }
          ]
        }'),
        (2, '
        {
          "matchDetail": [
            {
              "matchType": "Match",
              "matchDate": [
                {
                  "rainyDate": "2024-10-04",
                  "sunnyDate": "2024-10-04"
                },
                {
                  "rainyDate": "2024-10-11"
                }
              ]
            }
          ]
        }'),
        (3, '
        {
          "matchDetail": [
            {
              "matchType": "Match"
            }
          ]
        }'))
    
    -- query
    SELECT matchId
        , t.rainyDate
        , t.sunnyDate
    FROM matches
        CROSS JOIN UNNEST(
                cast(json_parse(json_query(matchDetails, 'lax $."matchDetail"[*]?(@."matchType"=="Match")."matchDate"')) as array(row(rainyDate varchar, sunnyDate varchar)))
            ) AS t(rainyDate, sunnyDate)
    ORDER BY matchId
    

    Which results in the following output:

    matchId rainyDate sunnyDate
    1 2024-09-04 2024-09-04
    1 2024-09-11 2024-09-12
    1 2024-09-18 2024-09-19
    2 2024-10-04 2024-10-04
    2 2024-10-11