I love how the SHOW TABLES
query return the rows and the bytes for each table listed, but unfortunately, that's not the case for the SHOW SCHEMAS
query.
What's the best way to get the total byte size of a list of schemas?
I've managed to do this by querying information_schema.tables
:
SELECT
table_schema AS schema,
SUM(bytes) AS bytes
FROM information_schema.tables
GROUP BY schema;
Note that this only works for the current database, as each database has its own information_schema
. So this wouldn't work to get the size of a list of databases. However, it's possible to get the total size of each DB individually:
SELECT SUM(bytes) AS bytes
FROM [database_name].information_schema.tables;
A nice complement to this is to use a UDF to format the byte size nicely, like the nicebytes
example here, resulting in nicebytes(SUM(bytes)) AS bytes
in the query.
EDIT: Many of those queries can be unioned in order to get the results for multiple databases/schemas at once. They can also be made to run daily and store the result into another table, which provides a nice way to monitor the size of databases & schemas over time:
SELECT
table_catalog AS database,
table_schema AS schema,
SUM(bytes) AS bytes
FROM database_1.information_schema.tables
GROUP BY database, schema
UNION ALL
SELECT
table_catalog AS database,
table_schema AS schema,
SUM(bytes) AS bytes
FROM database_2.information_schema.tables
GROUP BY database, schema;
EDIT 2: It is now possible (in 2024) for stored procedures to return tables, so the query above could now be dynamically generated instead of hardcoded from a list of databases. This Medium article gives an example of how to do that.