sqljsonduckdb

Accessing deeply nested fields using DuckDB SQL


I have some deeply nested data from a json file and I'm trying to load it into DuckDB:

  "events": [
    {
      "id": "401638586",
      "uid": "s:40~l:41~e:401638586",
      "date": "2024-03-21T18:45Z",
      "name": "Wagner Seahawks at North Carolina Tar Heels",
      "shortName": "WAG VS UNC",
      "season": {
        "year": 2024,
        "type": 3,
        "slug": "post-season"
      },
      "competitions": [
        {
          "id": "401638586",
          "uid": "s:40~l:41~e:401638586~c:401638586",
          "date": "2024-03-21T18:45Z",
          "attendance": 18223,
          "type": {
            "id": "6",
            "abbreviation": "TRNMNT"
          },
          "timeValid": true,
          "neutralSite": true,
          "conferenceCompetition": false,
          "playByPlayAvailable": true,
          "recent": false,
.
.
.

The query I'm using looks like:

select 
    games['id'] as id, 
    games['date'] as date,
    games['season']['year'] as season_year,
    games['season']['slug'] as season_slug,
    '2024-03-21' as partition_date, 
    games['name'] as name, 
    games['shortName'] as short_name, 
    games['status']['period'] as period,
    games['status']['type']['completed'] as completed, 
    games['competitions'][0]['neutralSite'] as neutral, 
    games['competitions'][0]['conferenceCompetition'] as in_conference, 
    games['competitions'][0]['playByPlayAvailable'] as pbp_available 
from (
    select 
        unnest(events) as games 
    from read_json('/path/to/json/data')
)
limit 1;

The output looks like this:

            id = 401638586
          date = 2024-03-21T18:45Z
   season_year = 2024
   season_slug = post-season
partition_date = 2024-03-21
          name = Wagner Seahawks at North Carolina Tar Heels
    short_name = WAG VS UNC
        period = 2
     completed = true
       neutral = 
 in_conference = 
 pbp_available = 

You can see that once it gets into the nested 'competitions' field it just returns empty/null values. How can I access those fields correctly? I have tried using json_extract but can't seem to get it to work.


Solution

  • You're using [0] instead of [1]

    See the warning in the docs:

    Following PostgreSQL's conventions, DuckDB uses 1-based indexing for arrays and lists but 0-based indexing for the JSON data type.

    Using [1] will produce the expected values.

    games['competitions'][1]['neutralSite'] as neutral, 
    games['competitions'][1]['conferenceCompetition'] as in_conference, 
    games['competitions'][1]['playByPlayAvailable'] as pbp_available 
    
    Rows: 1
    Columns: 10
    $ id              <str> '401638586'
    $ date            <str> '2024-03-21T18:45Z'
    $ season_year     <i64> 2024
    $ season_slug     <str> 'post-season'
    $ partition_date  <str> '2024-03-21'
    $ name            <str> 'Wagner Seahawks at North Carolina Tar Heels'
    $ short_name      <str> 'WAG VS UNC'
    $ neutral        <bool> True
    $ in_conference  <bool> False
    $ pbp_available  <bool> True