databricksazure-databricksservice-principal

Azure Databricks job fails to access ADLS storage after renewing service principal


Databricks job used to connect to ADLS G2 storage and process the files successfully.

Recently after renewing the Service Principal secrets, and updating the secret in Key-vault, now the jobs are failing.

using the databricks-cli databricks secrets list-scopes --profile mycluster, i was able to identify which key valut is being used, Also verified the corresponding secrets are updated correctly.

Within the notebook, i followed link and was able to access the ALDS

Below i used to test the key vault values, to access the ADLS.

scopename="name-of-the-scope-used-in-databricks-workspace"

appId=dbutils.secrets.get(scope=scopename,key="name-of-the-key-from-keyvault-referring-appid")
directoryId=dbutils.secrets.get(scope=scopename,key="name-of-key-from-keyvault-referring-TenantId")
secretValue=dbutils.secrets.get(scope=scopename,key="name-of-key-from-keyvaut-referring-Secretkey")
storageAccount="ADLS-Gen2-StorageAccountName"

spark.conf.set(f"fs.azure.account.auth.type.{storageAccount}.dfs.core.windows.net", "OAuth")
spark.conf.set(f"fs.azure.account.oauth.provider.type.{storageAccount}.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set(f"fs.azure.account.oauth2.client.id.{storageAccount}.dfs.core.windows.net", appid)
spark.conf.set(f"fs.azure.account.oauth2.client.secret.{storageAccount}.dfs.core.windows.net", secretValue)
spark.conf.set(f"fs.azure.account.oauth2.client.endpoint.{storageAccount}.dfs.core.windows.net", f"https://login.microsoftonline.com/{directoryid}/oauth2/token")
dbutils.fs.ls("abfss://<container-name>@<storage-accnt-name>.dfs.core.windows.net/<folder>")

With an attached cluster, above successfully display the list of folders/files within the ADLS G2 storage.

The code used to create the mount point, which used old secrets info.

scope_name="name-of-the-scope-from-workspace"
directoryId=dbutils.secrets.get(scope=scope_name, key="name-of-key-from-keyvault-which-stores-tenantid-value")
configs = {"fs.azure.account.auth.type": "OAuth",
          "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
          "fs.azure.account.oauth2.client.id": dbutils.secrets.get(scope=scope_name, key="name-of-key-from-key-vault-referring-to-clientid"),
          "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope=scope_name, key="name-of-key-from-key-vault-referring-to-secretvalue-generated-in-sp-secrets"),
          "fs.azure.account.oauth2.client.endpoint": f"https://login.microsoftonline.com/{directoryId}/oauth2/token"}

storage_acct_name="storageaccountname"
container_name="name-of-container"

mount_point = "/mnt/appadls/content"
if not any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
  print(f"Mounting {mount_point} to DBFS filesystem")
  dbutils.fs.mount(
    source = f"abfss://{container_name}@{storage_acct_name}.dfs.core.windows.net/",
    mount_point = mount_point,
    extra_configs = configs)
else:
  print("Mount point {mount_point} has already been mounted.")

In my case the key vault is updated with clientid, tenant/directory id, SP secret key.

After renewing the service prinicpal, when accessing the /mnt/path, I see below exception.

...
response '{"error":"invalid_client","error_description":"AADSTS7000215: Invalid client secret is provided.

The only thing i could think of is the mount point was created with old secrets as in the above code. After renewing the service principal do i need to unmount and re-create the mount point?


Solution

  • So i finally tried to unmount and mount the ADLS G2 storage, now i am able to access that.

    I didn't expect that the configuration would somehow be persisted. just updating the service principal secret is sufficient.