databricksdatabricks-sqldeltadatabricks-unity-catalog

How do I find the file size for my Delta tables in Databricks? I want to be able to expand it to multiple tables


I would like to know the total size of a table, as well as the file sizes of the files that comprise it.

Using describe detail works DESCRIBE DETAIL table1, but using the information as a table doesn't.

These fail

SELECT sizeInBytes FROM (Describe detail table1)

SELECT sizeInBytes FROM TABLE(Describe detail table1)

with this error

[PARSE_SYNTAX_ERROR] Syntax error at or near 'detail'. SQLSTATE: 42601 line 3, pos 39

== SQL ==
SELECT sizeInBytes FROM TABLE(Describe detail

I am using Unity Catalog for all data. I found this answer but its in Scala.


Solution

  • This works in SQL

    select
      count(*) as rows,
      file_path,
      file_size
    from
      (
        select
          *,
          _metadata.file_path,
          _metadata.file_size
        FROM
          table1
      )
    group by
      file_path,
      file_size
    

    Here is Python

    spark.sql("describe detail table1").select("sizeInBytes").collect()