I have a bucket that has 28 files and I would like to load them in different tables (one table per file in the bucket).
Each file in the bucket looks like this:
file_1_2020.csv
file_2_2020.csv
..
file_28_2020.csv
I would like that the table had the first 6 characters of the filename + _table(they are all csv with headers). For example, file_1_table
.
I read in a blog that I could achieve this by creating a federated query that list all files in the bucket. Then I could loop through all names and create the tables I need. The thing is that all documentation I have read on GCP is about federating queries in Cloud SQL, so I don't really know how I can make BigQuery read my bucket.
How can I achieve this? It's not important if it is done with a federated query (but I would like to know, because it's always good to know and learn new things). Also, I would like to have the tables partitioned and clustered by datetime, which is the first column of each file (the schema of each file is datetime:TIMESTAMP,col1:FLOAT,col2:FLOAT
)
You can use EXECUTE IMMEDIATE
to achieve this:
Step-1: Create an external table to get list of files in storage bucket
CREATE EXTERNAL TABLE mydataset.mytable OPTIONS (format = 'CSV', uris = ['gs://bucket_name/*.csv']);
Step-2: Use EXECUTE IMMEDIATE to create tables dynamically
DECLARE FILE_LIST ARRAY<STRING>;
DECLARE TABLE_NAME STRING;
DECLARE I INT64 DEFAULT 1;
DECLARE CNT INT64 DEFAULT 0;
SET FILE_LIST = ARRAY(SELECT DISTINCT _FILE_NAME as FILENAME FROM mydataset.mytable);
SET CNT = ARRAY_LENGTH(FILE_LIST);
WHILE I <= CNT
DO
SET TABLE_NAME = CONCAT(SUBSTR(REPLACE(SUBSTR(FILE_LIST[ORDINAL(i)], INSTR(FILE_LIST[ORDINAL(i)], '/', -1) + 1) ,'.csv', ''), 1, 6), '_table');
EXECUTE IMMEDIATE "CREATE EXTERNAL TABLE mydataset." || TABLE_NAME || " OPTIONS (format = 'CSV', uris = ['" || FILE_LIST[ORDINAL(I)] || "'])";
SET I = I + 1;
END WHILE;