Using the following PySpark code I have successfully managed to mount an azure onelake storage account. However when I attempt to read display the path using display(dbutils.fs.ls('/mnt/lake'))
I get the following error:
Operation failed: "Forbidden", 403, GET, https://onelake.dfs.fabric.microsoft.com/DataEngineeringWKSP?upn=false&resource=filesystem&maxResults=5000&directory=my_lakehouse.Lakehouse&timeout=90&recursive=false, Forbidden, "User is not authorized to perform current operation for workspace 'xxxxxx-ad19-489b-944e-82d6fc013b87', artifact 'xxxxx-3c39-44b8-8982-ddecef9e829c'."
I get a similar error when I attempt to read files in the onelake account:
Operation failed: "Forbidden", 403, HEAD, https://onelake.dfs.fabric.microsoft.com/DataEngineeringWKSP/sqlite_lakehouse.Lakehouse/Files/expdata.csv?upn=false&action=getStatus&timeout=90
The code I used to mount the onelake storage account is as follows:
url = "abfss://DataEngineeringWKSP@onelake.dfs.fabric.microsoft.com/sqlite_lakehouse.Lakehouse"
mount_folder = "/mnt/lake"
# OAuth configuration settings for OneLake
configs = {
"fs.azure.account.auth.type.onelake.dfs.fabric.microsoft.com": "OAuth",
"fs.azure.account.oauth.provider.type.onelake.dfs.fabric.microsoft.com": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id.onelake.dfs.fabric.microsoft.com": "xxxxxx-a061-4899-994b-81253d864bc8",
"fs.azure.account.oauth2.client.secret.onelake.dfs.fabric.microsoft.com": "xxxxxx~1Q.B-Ey12zs066D_G3.E6bslnE_LqY-aFs",
"fs.azure.account.oauth2.client.endpoint.onelake.dfs.fabric.microsoft.com": "https://login.microsoftonline.com/xxxxxxxxxxxxxf12fc6/oauth2/token"
}
mounted_list = dbutils.fs.mounts()
mounted_exist = False
for item in mounted_list:
if mount_folder in item.mountPoint:
mounted_exist = True
break
if not mounted_exist:
dbutils.fs.mount(source=url, mount_point=mount_folder, extra_configs=configs)
I believe I need to add permissions in the Azure Fabric workspace, but I'm struggling to locate exactly where to add the permissions
ERROR: Operation failed: "Forbidden", 403, GET, https://onelake.dfs.fabric.microsoft.com/DataEngineeringWKSP?upn=false&resource=filesystem&maxResults=5000&directory=my_lakehouse.Lakehouse&timeout=90&recursive=false,Forbidden, "User is not authorized to perform current operation for workspace 'xxxxxx-ad19-489b-944e-82d6fc013b87', artifact 'xxxxx-3c39-44b8-8982-ddecef9e829c'."
Means that your current user identity does not have permission to access the specified OneLake workspace or Lakehouse artifact.
As you mentioned you want provide the permission to the service principal you can follow the below steps:
First Under the Admin portal
Next, to grant access to a service principal Now that the feature is enabled, you can grant a service principal or managed identity access to your Fabric Workspace. To do this, navigate to the desired workspace and select Manage access.
Next, click on + Add people or groups, then type the name of your service principal or managed identity. When it appears in the suggestions, select it and click Add.
Available roles:
Admin: Grants full control over the Workspace, including deletion rights. It's generally not recommended for service principals due to its high level of access.
Member: Provides nearly full access, excluding the ability to modify workspace settings or manage access permissions.
Contributor (recommended): Allows most actions, such as managing content, but cannot modify member access or share items. This role offers the minimum required privileges for accessing OneLake via the API.
Viewer: Offers read-only access to view workspace items, run SQL queries, and execute pipelines. Note that this role does not allow API access to OneLake.
Using Azure CLI the below command
`az ad sp create-for-rbac -n "Fabricator`"
This command will generate a new service principal named Fabricator and display the following details:
{
"appId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"displayName": "Fabricator",
"password": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"tenant": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
The appId represents the client ID, and the password corresponds to the client secret. These credentials are used to authenticate to OneLake via the Python SDK.
You will need to install the package:
pip install azure-identity azure-storage-file-datalake
With the necessary packages installed, you can use the following code snippet to authenticate and connect to OneLake.
from azure.identity import ClientSecretCredential from azure.storage.filedatalake import FileSystemClient
cred = ClientSecretCredential(tenant_id="<your-tenant-id>",
client_id="<your-client-id>",
client_secret="<your-client-secret>")
file_system_client = FileSystemClient(
account_url="https://onelake.dfs.fabric.microsoft.com",
file_system_name="<name-of-the-workspace-you-want-to-access>",
credential=cred)
paths = file_system_client.get_paths(path="/<name-of-the-lakehouse>.Lakehouse/Tables/")
for p in paths:
print(p.name)
Reference: Thanks @Sam Debruyn for excellent article on How to use service principal authentication to access Microsoft Fabric's OneLake deltalake 0.25.5