jsongoogle-bigqueryloadgoogle-bucket

Bigquery LOAD job failing when using JSON column in the destination


we have had a job running for quite some time every hour without any issues, however, today we have started to encounter an error when loading the data from Google Cloud bucket to bigquery table.

Nothing has changed in the procedure that runs it and when we try to re-run it with older data it fails too, even though it has been successful before, so this might be caused by some very recent update on BQ side?

We are experiencing this error:

The underlying data type of the failing table is JSON and it says that the data is STRUCT which should easily be interpreted as JSON, at least that's what I think and it has worked like that before.

Easy example for reproduction is exporting the data from a table with a JSON field and loading it back to the same table:


create table dataset.test (
  ts TIMESTAMP,
  Event JSON
);

insert into dataset.test
values (
"2023-03-15 13:35:48",
PARSE_JSON('{"Timestamp":"2023-03-15 13:35:48.0000000", "RandomColumn": "Random value"}')
);

EXPORT DATA
  OPTIONS (
    uri = 'gs://bucket_name/json_debug_example*.json',
    format = 'JSON',
    overwrite = true
  )
AS (
  select * from dataset.test
);

LOAD DATA into dataset.test FROM FILES ( format = 'JSON', uris = ['gs://bucket_name/json_debug_example*.json']);

This leads to an error: "Invalid value: Invalid schema update. Field Event has changed type from JSON to RECORD at [23:1]"

Any idea what could be an issue here?


Solution

  • I have just got an answer from the BigQuery Engineering team to use a schema specification in the load statement as a workaround. So if anyone is experiencing a similar issue, specifying the schema will solve it:

    LOAD DATA into dataset.test (ts TIMESTAMP, Event JSON) FROM FILES ( format = 'JSON', uris = ['gs://bucket_name/json_debug_example*.json']);