databrickscatalogdatabricks-unity-catalogdata-governance

Catalogs in Databricks


I have started reading about the Unity Catalog that Databricks has introduced. I understand the basic issue that it is trying to solve, but I do not understand what exactly a Catalog is.

This was available in the Databricks documentation,

A catalog contains schemas (databases), and a schema contains tables and views.

https://docs.databricks.com/data-governance/unity-catalog/create-catalogs.html

How does this added layer (on top of schemas) help? I am guessing it has something to do with governance?

I would really appreciate an example, if possible.


Solution

  • Really, Catalog is an another data management layer inside the bigger objects - Unity Catalog Metastore. Closest analogy of the Catalog is a single Hive Metastore - it's also contains databases (schemas) that contain tables and views. Catalogs could be used to isolate objects of some entity (business unit/project/environments (dev,stagin,prod)/...) from objects of other entities. You can give manage permissions of the catalogs to respective admins of the business units, projects, ..., and they can then assign permissions on individual schemas and tables/views.