azureazure-databricks

Databricks Operation failed: "Forbidden", 403 when attempting to access Azure Fabric OneLake


Using the following PySpark code I have successfully managed to mount an azure onelake storage account. However when I attempt to read display the path using display(dbutils.fs.ls('/mnt/lake')) I get the following error:

Operation failed: "Forbidden", 403, GET, https://onelake.dfs.fabric.microsoft.com/DataEngineeringWKSP?upn=false&resource=filesystem&maxResults=5000&directory=my_lakehouse.Lakehouse&timeout=90&recursive=false, Forbidden, "User is not authorized to perform current operation for workspace 'xxxxxx-ad19-489b-944e-82d6fc013b87', artifact 'xxxxx-3c39-44b8-8982-ddecef9e829c'."

I get a similar error when I attempt to read files in the onelake account:

Operation failed: "Forbidden", 403, HEAD, https://onelake.dfs.fabric.microsoft.com/DataEngineeringWKSP/sqlite_lakehouse.Lakehouse/Files/expdata.csv?upn=false&action=getStatus&timeout=90

The code I used to mount the onelake storage account is as follows:

url = "abfss://DataEngineeringWKSP@onelake.dfs.fabric.microsoft.com/sqlite_lakehouse.Lakehouse"
mount_folder = "/mnt/lake"

# OAuth configuration settings for OneLake
configs = {
    "fs.azure.account.auth.type.onelake.dfs.fabric.microsoft.com": "OAuth",
    "fs.azure.account.oauth.provider.type.onelake.dfs.fabric.microsoft.com": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
    "fs.azure.account.oauth2.client.id.onelake.dfs.fabric.microsoft.com": "xxxxxx-a061-4899-994b-81253d864bc8",
    "fs.azure.account.oauth2.client.secret.onelake.dfs.fabric.microsoft.com": "xxxxxx~1Q.B-Ey12zs066D_G3.E6bslnE_LqY-aFs",
    "fs.azure.account.oauth2.client.endpoint.onelake.dfs.fabric.microsoft.com": "https://login.microsoftonline.com/xxxxxxxxxxxxxf12fc6/oauth2/token"
}

mounted_list = dbutils.fs.mounts()
mounted_exist = False

for item in mounted_list:
    if mount_folder in item.mountPoint:
        mounted_exist = True
        break

if not mounted_exist:
    dbutils.fs.mount(source=url, mount_point=mount_folder, extra_configs=configs)

I believe I need to add permissions in the Azure Fabric workspace, but I'm struggling to locate exactly where to add the permissions


Solution

  • ERROR: Operation failed: "Forbidden", 403, GET, https://onelake.dfs.fabric.microsoft.com/DataEngineeringWKSP?upn=false&resource=filesystem&maxResults=5000&directory=my_lakehouse.Lakehouse&timeout=90&recursive=false,Forbidden, "User is not authorized to perform current operation for workspace 'xxxxxx-ad19-489b-944e-82d6fc013b87', artifact 'xxxxx-3c39-44b8-8982-ddecef9e829c'."

    Means that your current user identity does not have permission to access the specified OneLake workspace or Lakehouse artifact.

    As you mentioned you want provide the permission to the service principal you can follow the below steps:

    First Under the Admin portal

    enter image description here

    Next, to grant access to a service principal Now that the feature is enabled, you can grant a service principal or managed identity access to your Fabric Workspace. To do this, navigate to the desired workspace and select Manage access.

    enter image description here

    Next, click on + Add people or groups, then type the name of your service principal or managed identity. When it appears in the suggestions, select it and click Add.

    enter image description here

    Available roles:

    Using Azure CLI the below command

    `az ad sp create-for-rbac -n "Fabricator`"
    

    This command will generate a new service principal named Fabricator and display the following details:

    {
      "appId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
      "displayName": "Fabricator",
      "password": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
      "tenant": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    }
    

    The appId represents the client ID, and the password corresponds to the client secret. These credentials are used to authenticate to OneLake via the Python SDK.

    You will need to install the package:

    pip install azure-identity azure-storage-file-datalake
    

    With the necessary packages installed, you can use the following code snippet to authenticate and connect to OneLake.

    from azure.identity import ClientSecretCredential from azure.storage.filedatalake import FileSystemClient

    cred = ClientSecretCredential(tenant_id="<your-tenant-id>",
                                  client_id="<your-client-id>",
                                  client_secret="<your-client-secret>")
    
    file_system_client = FileSystemClient(
        account_url="https://onelake.dfs.fabric.microsoft.com",
        file_system_name="<name-of-the-workspace-you-want-to-access>",
        credential=cred)
    paths = file_system_client.get_paths(path="/<name-of-the-lakehouse>.Lakehouse/Tables/")
    for p in paths:
        print(p.name)
    

    Reference: Thanks @Sam Debruyn for excellent article on How to use service principal authentication to access Microsoft Fabric's OneLake deltalake 0.25.5