apache-sparkpysparkdatabricksazure-data-lake-gen2databricks-community-edition

Unable to mount Azure ADLS Gen 2 on from Community Edition of Databricks : com.databricks.rpc.UnknownRemoteException: Remote exception occurred


I am trying to mount ADLS Gen 2 from my databricks Community Edition, but when I run the following code:

test = spark.read.csv("/mnt/lake/RAW/csds.csv", inferSchema=True, header=True)

I get the error:

com.databricks.rpc.UnknownRemoteException: Remote exception occurred:

I'm using the following code to mount ADLS Gen 2

def check(mntPoint):
  a= []
  for test in dbutils.fs.mounts():
    a.append(test.mountPoint)
  result = a.count(mntPoint)
  return result

mount = "/mnt/lake"

if check(mount)==1:
  resultMsg = "<div>%s is already mounted. </div>" % mount
else:
  dbutils.fs.mount(
  source = "wasbs://root@axxxxxxx.blob.core.windows.net",
  mount_point = mount,
  extra_configs = {"fs.azure.account.key.xxxxxxxx.blob.core.windows.net":""})
  resultMsg = "<div>%s was mounted. </div>" % mount

displayHTML(resultMsg)


ServicePrincipalID = 'xxxxxxxxxxx'
ServicePrincipalKey = 'xxxxxxxxxxxxxx'
DirectoryID =  'xxxxxxxxxxxxxxx'
Lake =  'adlsgen2'


# Combine DirectoryID into full string
Directory = "https://login.microsoftonline.com/{}/oauth2/token".format(DirectoryID)

# Create configurations for our connection
configs = {"fs.azure.account.auth.type": "OAuth",
           "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
           "fs.azure.account.oauth2.client.id": ServicePrincipalID,
           "fs.azure.account.oauth2.client.secret": ServicePrincipalKey,
           "fs.azure.account.oauth2.client.endpoint": Directory}



mount = "/mnt/lake"

if check(mount)==1:
  resultMsg = "<div>%s is already mounted. </div>" % mount
else:
  dbutils.fs.mount(
  source = f"abfss://root@{Lake}.dfs.core.windows.net/",
  mount_point = mount,
  extra_configs = configs)
  resultMsg = "<div>%s was mounted. </div>" % mount

I then try to read a dataframe in ADLS Gen 2 using the following:

dataPath = "/mnt/lake/RAW/DummyEventData/Tools/"

test = spark.read.csv("/mnt/lake/RAW/csds.csv", inferSchema=True, header=True)

com.databricks.rpc.UnknownRemoteException: Remote exception occurred:

Any ideas?


Solution

  • Based on the stacktrace, most probably reason for that error is that you don't have Storage Blob Data Contributor (or Storage Blob Data Reader) role assigned for your service principal (as it's described in documentation). This role is different from usual "Contributor" role, and that's very confusing.