I am trying to mount ADLS Gen 2 from my databricks Community Edition, but when I run the following code:
test = spark.read.csv("/mnt/lake/RAW/csds.csv", inferSchema=True, header=True)
I get the error:
com.databricks.rpc.UnknownRemoteException: Remote exception occurred:
I'm using the following code to mount ADLS Gen 2
def check(mntPoint):
a= []
for test in dbutils.fs.mounts():
a.append(test.mountPoint)
result = a.count(mntPoint)
return result
mount = "/mnt/lake"
if check(mount)==1:
resultMsg = "<div>%s is already mounted. </div>" % mount
else:
dbutils.fs.mount(
source = "wasbs://root@axxxxxxx.blob.core.windows.net",
mount_point = mount,
extra_configs = {"fs.azure.account.key.xxxxxxxx.blob.core.windows.net":""})
resultMsg = "<div>%s was mounted. </div>" % mount
displayHTML(resultMsg)
ServicePrincipalID = 'xxxxxxxxxxx'
ServicePrincipalKey = 'xxxxxxxxxxxxxx'
DirectoryID = 'xxxxxxxxxxxxxxx'
Lake = 'adlsgen2'
# Combine DirectoryID into full string
Directory = "https://login.microsoftonline.com/{}/oauth2/token".format(DirectoryID)
# Create configurations for our connection
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": ServicePrincipalID,
"fs.azure.account.oauth2.client.secret": ServicePrincipalKey,
"fs.azure.account.oauth2.client.endpoint": Directory}
mount = "/mnt/lake"
if check(mount)==1:
resultMsg = "<div>%s is already mounted. </div>" % mount
else:
dbutils.fs.mount(
source = f"abfss://root@{Lake}.dfs.core.windows.net/",
mount_point = mount,
extra_configs = configs)
resultMsg = "<div>%s was mounted. </div>" % mount
I then try to read a dataframe in ADLS Gen 2 using the following:
dataPath = "/mnt/lake/RAW/DummyEventData/Tools/"
test = spark.read.csv("/mnt/lake/RAW/csds.csv", inferSchema=True, header=True)
com.databricks.rpc.UnknownRemoteException: Remote exception occurred:
Any ideas?
Based on the stacktrace, most probably reason for that error is that you don't have Storage Blob Data Contributor (or Storage Blob Data Reader) role assigned for your service principal (as it's described in documentation). This role is different from usual "Contributor" role, and that's very confusing.