databasesnowflake-cloud-data-platformsizedatabase-schema

How to get the total byte size of schemas & databases in Snowflake


I love how the SHOW TABLES query return the rows and the bytes for each table listed, but unfortunately, that's not the case for the SHOW SCHEMAS query.

What's the best way to get the total byte size of a list of schemas?


Solution

  • I've managed to do this by querying information_schema.tables:

    SELECT
      table_schema AS schema,
      SUM(bytes) AS bytes
    FROM information_schema.tables
    GROUP BY schema;
    

    Note that this only works for the current database, as each database has its own information_schema. So this wouldn't work to get the size of a list of databases. However, it's possible to get the total size of each DB individually:

    SELECT SUM(bytes) AS bytes
    FROM [database_name].information_schema.tables;
    

    A nice complement to this is to use a UDF to format the byte size nicely, like the nicebytes example here, resulting in nicebytes(SUM(bytes)) AS bytes in the query.


    EDIT: Many of those queries can be unioned in order to get the results for multiple databases/schemas at once. They can also be made to run daily and store the result into another table, which provides a nice way to monitor the size of databases & schemas over time:

    SELECT
      table_catalog AS database,
      table_schema AS schema,
      SUM(bytes) AS bytes
    FROM database_1.information_schema.tables
    GROUP BY database, schema
    UNION ALL
    SELECT
      table_catalog AS database,
      table_schema AS schema,
      SUM(bytes) AS bytes
    FROM database_2.information_schema.tables
    GROUP BY database, schema;
    

    EDIT 2: It is now possible (in 2024) for stored procedures to return tables, so the query above could now be dynamically generated instead of hardcoded from a list of databases. This Medium article gives an example of how to do that.