I would like to find the number of records for a list of tables that start with a common prefix via a SQL query.
In BigQuery I am trying a combination of CONCAT()
or FORMAT()
along with INFORMATION_SCHEMA.TABLES
in FOR
loops or WHILE
loops that in my mind should eventually be executed with EXECUTE IMMEDIATE
but I am not able to setup the final query.
This is what I have:
DECLARE query_string STRING;
FOR record IN (
select CONCAT(
"SELECT COUNT(*) AS cnt_rows,",
"\"", table_name, "\"", " AS t_name ",
"FROM ", CONCAT("`",table_schema,".", table_name, "`" )
) AS MY_STMNT
FROM `<<my-project-id>>.<<my-dataset>>.INFORMATION_SCHEMA.TABLES`
WHERE table_name LIKE "my-table-prefix__YYYY-MM-DDT%"
) DO
EXECUTE IMMEDIATE
-- 1st attempt:
-- FORMAT( """
-- %s
-- """, record.MY_STMNT);
-- 2nd attempt (this doesn't even execute because of "Syntax error: Unexpected keyword SET at xxx"):
-- SET query_string = query_string || record.MY_STMNT || " UNION ALL ";
END FOR;
This fails because the results are separate as one for each table. I would Like all of them in one final output result where each row is:
t_name
cnt_rows
How do I do that?
The solution is to use a string variable to accumulate SQL statements combined with UNION ALL
.
The issue with my previous attempts was that I wanted to EXECUTE IMMEDIATE
at each iteration in the for loop, instead of doing it after the entire loop.
DECLARE query_string STRING;
SET query_string = "";
FOR record IN (
-- dynamically build the SELECT query based on the data in INFORMATION_SCHEMA e.g.:
-- - the table name
-- - the number of rows in that table
SELECT CONCAT(
"SELECT ",
"'", table_name, "'", " AS t_name, ",
"COUNT(*) AS cnt_rows, ",
"FROM ", CONCAT("`", table_schema, ".", table_name, "`")
) AS MY_STMNT
FROM `<<my-project-id>>.<<my-dataset>>.INFORMATION_SCHEMA.TABLES`
WHERE table_name LIKE "my-table-prefix__YYYY-MM-DDT%"
) DO
SET query_string = query_string || record.MY_STMNT || " UNION ALL ";
END FOR;
-- just add a dummy "last row" to make "UNION ALL" work
SET query_string = query_string || "SELECT 'x' AS t_name, 0 AS cnt_rows";
-- make sure we rank the output table by the number of records from biggest to smallest
SET query_string = query_string || " ORDER BY cnt_rows DESC";
-- to inspect the SQL query, just print it
SELECT query_string;
-- invoke the SQL query
EXECUTE IMMEDIATE query_string;
Be careful though: if you combine together with UNION ALL
multiple complex SQL queries e.g. SELECT
, then BigQuery may throw an error like this:
Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex.. at [1:1]
This of course depends on the content of the SQL query you are building. The procedure above may apply to your case without throwing errors when invoked at scale.