We are using Bigquery streaming inserts with Dataflow using the predefined Dataflow job template.
I ran into some peculiarities when using this with nullable and repeated fields.
For instance, with the schema
name STRING, NULLABLE
Attempting to do the insert {name: null}
fails with the error:
generic::invalid_argument: This field is not a record.","location":"name","message":"This field is not a record."
This is not such a big deal since it's easy enough to simply drop null fields, and similarly for empty arrays.
However, now if our schema is:
name STRING, REPEATED
and we want to insert ["a", "b", null, "c"]
we get a similar error referencing the third element.
To provide a row with a null value for a NULLABLE field, simply omit the field from the row that you are inserting. For your second example, a REPEATED field (or an ARRAY in SQL terms) cannot have a null element. To model an array of NULLABLE STRING, you can use a REPEATED RECORD that contains a STRING field named value
, for instance, or equivalently an ARRAY<STRUCT<value STRING>>
in SQL terms.