jsonduckdb

Create table from JSON


The DuckDB documentation states that you can create a table starting from a JSON file, with DuckDB automatically constructing the table schema, for instance:

CREATE TABLE todos AS
    SELECT * FROM 'todos.json';

But how can I pass JSON as a string?

I have tried:

CREATE TABLE todos AS SELECT * FROM json('[{"familiy": "Mark"}, {"family": "Nej"}]');

But it says:

Catalog Error:
Table Function with name "json" is not in the catalog, a function by this name exists in the json extension, but it's of a different type, namely Macro Function

The use case is that I would like to save data from OpenAlex, for example the list of works of Claudia Goldin, and it would be good to have DuckDB create the schema itself without recurring to manually define a SQLAlchemy ORM table.


Solution

  • Inputting json_structure as structure for from_json

    (working in DuckDB 1.2.2)

    Thanks to this article from SeaChess, it appeared that asking json_structure to analyze the items of the JSON array, then passing it to from_json, will generate a struct with all fields present:

    create table todos as
    select j.* from (select unnest(from_json('[{"familiy": "Mark"}, {"family": "Nej"}]', json_structure('[{"familiy": "Mark"}, {"family": "Nej"}]'))) j);
    select * from todos;
    

    Note that "JSON structure must be a constant!", thus the necessity to pass the JSON string (you cannot call it with a column of an existing table as parameter).

    A possibility (used in the aforementioned article) is to put the structure in a variable, which in the final query will be considered as a constant:

    create temp table d as select json('[{"familiy": "Mark"}, {"family": "Nej"}]') d;
    set variable json_schema = (select json_structure(d) from d);
    create table todos as
    select j.* from (select unnest(from_json(d, getvariable('json_schema'))) j from d);
    select * from todos;
    
    familiy (varchar) family (varchar)
    Mark NULL
    NULL Nej