google-bigquery google-cloud-storage federated-queries federated-table

How to read every file in a Google Storage bucket from BigQuery and create one table for each file in the bucket?

I have a bucket that has 28 files and I would like to load them in different tables (one table per file in the bucket).

Each file in the bucket looks like this:

file_1_2020.csv
file_2_2020.csv
..
file_28_2020.csv

I would like that the table had the first 6 characters of the filename + _table(they are all csv with headers). For example, file_1_table.

I read in a blog that I could achieve this by creating a federated query that list all files in the bucket. Then I could loop through all names and create the tables I need. The thing is that all documentation I have read on GCP is about federating queries in Cloud SQL, so I don't really know how I can make BigQuery read my bucket.

How can I achieve this? It's not important if it is done with a federated query (but I would like to know, because it's always good to know and learn new things). Also, I would like to have the tables partitioned and clustered by datetime, which is the first column of each file (the schema of each file is datetime:TIMESTAMP,col1:FLOAT,col2:FLOAT)

Solution

You can use EXECUTE IMMEDIATE to achieve this:

Step-1: Create an external table to get list of files in storage bucket

CREATE EXTERNAL TABLE mydataset.mytable OPTIONS (format = 'CSV', uris = ['gs://bucket_name/*.csv']);

Step-2: Use EXECUTE IMMEDIATE to create tables dynamically

DECLARE FILE_LIST ARRAY<STRING>;
DECLARE TABLE_NAME STRING;
DECLARE I INT64 DEFAULT 1;
DECLARE CNT INT64 DEFAULT 0;
SET FILE_LIST = ARRAY(SELECT DISTINCT _FILE_NAME as FILENAME FROM mydataset.mytable);
SET CNT = ARRAY_LENGTH(FILE_LIST);
WHILE I <= CNT 
DO
  SET TABLE_NAME = CONCAT(SUBSTR(REPLACE(SUBSTR(FILE_LIST[ORDINAL(i)], INSTR(FILE_LIST[ORDINAL(i)], '/', -1) + 1) ,'.csv', ''), 1, 6), '_table');
  EXECUTE IMMEDIATE "CREATE EXTERNAL TABLE mydataset." || TABLE_NAME || " OPTIONS (format = 'CSV', uris = ['" || FILE_LIST[ORDINAL(I)] || "'])";
  SET I = I + 1;
END WHILE;