google-bigqueryschemapipeline

Creating an empty table based on another table's schema in BigQuery


We have a BigQuery dataset that has some long list of tables (with data) in it. Since I am taking over a data pipeline, which I want to familiarize myself with by doing tests, I want to duplicate those dataset/tables without copying-truncating the tables. Essentially, I want to re-create those tables in a test dataset using their schema. How can this be done in bq client?


Solution

  • You have a couple of options considering you don't want to copy the data but the schema:

    1.- extract the schema for each table and then create new ones just empty.

    $ bq show --schema --format=prettyjson [PROJECT_ID]:[DATASET].[TABLE] > [SCHEMA_FILE]
    
    $ bq mk --table [PROJECT_ID]:[NEW_DATASET].[TABLE] [SCHEMA_FILE]
    

    2.- run a query with LIMIT 0 and setting a destination table.

    bq query "SELECT * FROM [DATASET].[TABLE] LIMIT 0" --destination_table [NEW_DATASET].[TABLE]
    

    NOTE: As per comment, it is important to highlight that LIMIT 0 will still scan the whole table.