jsonairflowndjson

Export as JSON using BigQueryToCloudStorageOperator


enter image description here

When I use the BigQuery console manually, I can see that the 3 options when exporting a table to GCS are CSV, JSON (Newline delimited), and Avro.

With Airflow, when using the BigQueryToCloudStorageOperator operator, what is the correct value to pass to export_format in order to transfer the data to GCS as JSON (Newline delimited)? Is it simply JSON? All examples I've seen online for BigQueryToCloudStorageOperator use export_format='CSV', never for JSON, so I'm not sure what the correct value here is. Our use case needs JSON, since the 2nd task in our DAG (after transferring data to GCS) is to then load that data from GCS into our MongoDB Cluster with mongoimport.


Solution

  • According to the BigQuery documentation the three possible formats to which you can export BigQuery query results are: CSV, JSON, and Avro (and this is compatible with the UI drop-down menu).

    enter image description here

    I would try with export_format='JSON' as you already proposed.