I'm using avro as the schema for google pub/sub to write directly to BigQuery.
One of the fields can be null, so I've written my avro schema like this:
{
"type": "record",
"name": "Avro",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "status",
"type": "string"
},
{
"name": "createDate",
"type": "string"
},
{
"name": "purchaseDate",
"type": ["null", "string"]
}
]
}
However, for an input to fit with this schema, it has to look something like one of the below:
{
"id": "123",
"status": "not-purchased",
"createDate": "2023-01-17T04:49:16.966Z",
"purchaseDate": null
}
{
"id": "123",
"status": "purchased",
"createDate": "2023-01-17T04:49:16.966Z",
"purchaseDate": {
"string": "2023-01-17T04:49:16.966Z"
}
}
The input in the 2nd example above is not in a format that is expected by the BigQuery subscription. I'm looking for something that looks like this instead:
{
"id": "123",
"status": "purchased",
"createDate": "2023-01-17T04:49:16.966Z",
"purchaseDate": "2023-01-17T04:49:16.966Z"
}
Is there something I did wrong with the avro schema or is it just the way it is for how nullable fields works in avro?
Update: This issue has now been fixed.
Original answer: This is a known issue with Pub/Sub BigQuery subscriptions. You can follow the progress on a fix in the issue tracker. Once fixed, the example that uses the string
keyword should work for inserting into BigQuery via Pub/Sub subscription.