google-bigquerygoogle-cloud-storage

SQL database to Bigquery or SQL database to GCS to BigQuery


In the book Data Engineering with Google Cloud Platform by Adi Wijaya, to load the data from a sql database to BigQuery, the author always load the data from sql to Google Cloud Storage first, and use it as staging environment, and only after that would he load data to BigQuery

What are the advantage of going through the GCS step and not straight away into BigQuery? In which case would you load directly data from SQL db to BigQuery?


Solution

  • BigQuery doesn't support the SQL format as mentioned in this post to directly load data from Cloud SQL to BigQuery. You can follow the below procedures:

    1. You can use BigQuery Cloud SQL federated query importing data directly into BigQuery from Cloud SQL.
    2. Based on this documentation, you should first generate CSV or JSON from the Cloud SQL Database and persist those files to Cloud Storage and load data into BigQuery.

    The advantages when loading data from Cloud SQL to Cloud Storage to BigQuery are:

    When you load data into BigQuery from Cloud Storage, you are not charged for the load operation, but you do incur charges for storing the data in Cloud Storage.

    Additional information, the cost of storing in BigQuery is higher than in Cloud storage. And you are subject to the following limitations when you load data into BigQuery from a Cloud Storage bucket.

    To suggest the best strategy, your question needs more information. Still it depends on your use case. And for more information on loading data can be found in the BigQuery documentation.