google-bigqueryavro

Nullable date in avro schema for google pub/sub


I'm using avro as the schema for google pub/sub to write directly to BigQuery.

One of the fields can be null, so I've written my avro schema like this:

{
  "type": "record",
  "name": "Avro",
  "fields": [
    {
      "name": "id",
      "type": "string"
    },
    {
      "name": "status",
      "type": "string"
    },
    {
      "name": "createDate",
      "type": "string"
    },
    {
      "name": "purchaseDate",
      "type": ["null", "string"]
    }
  ]
}

However, for an input to fit with this schema, it has to look something like one of the below:

{
  "id": "123",
  "status": "not-purchased",
  "createDate": "2023-01-17T04:49:16.966Z",
  "purchaseDate": null
}
{
  "id": "123",
  "status": "purchased",
  "createDate": "2023-01-17T04:49:16.966Z",
  "purchaseDate":  {
    "string": "2023-01-17T04:49:16.966Z"
  }
}

The input in the 2nd example above is not in a format that is expected by the BigQuery subscription. I'm looking for something that looks like this instead:

{
  "id": "123",
  "status": "purchased",
  "createDate": "2023-01-17T04:49:16.966Z",
  "purchaseDate": "2023-01-17T04:49:16.966Z"
}

Is there something I did wrong with the avro schema or is it just the way it is for how nullable fields works in avro?


Solution

  • Update: This issue has now been fixed.

    Original answer: This is a known issue with Pub/Sub BigQuery subscriptions. You can follow the progress on a fix in the issue tracker. Once fixed, the example that uses the string keyword should work for inserting into BigQuery via Pub/Sub subscription.