azuredatabricksazure-databricksdatabricks-unity-catalog

Azure Databricks Unity Catalog - Cannot access Managed Volume in notebook


The problem

After setting up Unity Catalog and a managed Volume, I can upload files to the volume and download files from the volume on Databricks Workspace UI.

However, I cannot access the volume from notebook. I created an All-purpose compute, and run dbutils.fs.ls("/Volumes/catalog1/schema1/volumn11"). Then I got the error

Operation failed: "This request is not authorized to perform this operation.", 403, GET

How we set up Unity Catalog and Managed Volume

  1. I am the Azure Databricks Account Admin, Metastore Admin, and Workspace Admin
  2. I created an Azure Databricks Workspace (Premium Tier)
  3. I created a Databricks Metastore, named metastore1
  4. I created an Azure ADSL Gen2 (storage account with Hierarchical namespace enabled), named adsl_gen2_1
  5. I created an Azure Access Connector for Azure Databricks (as an Azure Managed Identity), named access_connector_for_dbr_1
  6. In the adsl_gen2_1, I assigned the roles Storage Blob Data Contributor and Storage Queue Data Contributor to the access_connector_for_dbr_1
  7. I created two ADSL Gen2 containers under adsl_gen2_1
    • One named adsl_gen2_1_container_catalog_default
    • Another one named adsl_gen2_1_container_schema1
  8. I created a Databricks Storage Credentials, named dbr_strg_cred_1
    • The connector id is the resource id of access_connector_for_dbr_1
    • The Permissions of the Storage Credentials were not set (empty)
  9. I created two Databricks External Locations, both use the dbr_strg_cred_1
    • One external location named dbr_ext_loc_catalog_default, points to the ADSL Gen2 Container adsl_gen2_1_container_catalog_default, and the Permissions of this External Location were not set (empty)
    • Another one named dbr_ext_loc_schema1, points to the ADSL Gen2 Container adsl_gen2_1_container_schema1, and the Permissions of this External Location were not set (empty)
  10. I created a Databricks Catalog, named catalog1, under metastore1, and set dbr_ext_loc_catalog_default as this catalog's Storage Location
  11. I created a Databricks Schema, named schema1, under catalog1, and set dbr_ext_loc_schema1 as this schema's Storage Location
  12. I created a Databricks Volume, named volumn11, under schema1.
  13. On Databricks UI, I can upload files to the volume and download files from the volume11
  14. However, when I created an All-purpose compute, and run the below Python codes, I always got the error "Operation failed: "This request is not authorized to perform this operation.", 403, GET".
    • dbutils.fs.ls("/Volumes/catalog1/schema1/volumn11")
    • dbutils.fs.ls("dbfs:/Volumes/catalog1/schema1/volumn11")
    • spark.read.format("csv").option("header","True").load("/Volumes/catalog1/schema1/volumn11/123.csv")
    • spark.read.format("csv").option("header","True").load("dbfs:/Volumes/catalog1/schema1/volumn11/123.csv")

Details about the All-purpose compute


Solution

  • I found the reason and a solution, but I feel this is a bug. And I wonder what is the best practice.

    When I enable the ADSL Gen2's Public network access from all networks as shown below, I can access the volume from a notebook.

    enter image description here

    However, if I enable the ADSL Gen2's Public network access from selected virtual networks and IP addresses as shown below, I cannot access the volume from a notebook. Even though I added the VM's public IP to the whitelist, added the resource Microsoft.Databricks/accessConnectors to the resource instances, and enabled the Exceptions Allow Azure services on the trusted services list to access this storage account. As I understand, my compute has the Unity Catalog badge, it should access the ADSL Gen2 via the Access Connector for Databricks (Managed Identity), so it should be able to access the ADSL Gen2 via the Access Connector for Databricks.

    enter image description here

    enter image description here