I m accesing to files in the normal storage by the following method:
input_path = "my_path"
file= "file.mp3"
path = os.path.join(path_data, file)
full_path = '/dbfs/' + path
with open(full_path, mode='rb') as file: # b is important -> binary
fileContent = file.read()
I am not able to use the same method in sensitive storage
I am aware that sensitive storage have another way to acces data
path_sensitive_storage = 'mypath_sensitive'
If I use spark it works perfectly, but i am interested in not using spark read but open file
input_df = (spark.read
.format("binaryFile")
.option("header", "true")
.option("encoding", "UTF-8")
.csv(full_path)
)
There is a way to do that ?
Since you are using Azure Data Lake as a source, you need to mount the container in Databricks DBFS by using OAuth method. Once the container is mounted, you can use it.
Use the code below to mount the container.
configs = {"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "ba219eb4-0250-4780-8bd3-d7f3420dab6d",
"fs.azure.account.oauth2.client.secret": "0wP8Q~qWUwGSFrjyByvwK-.HjrHx2EEvG06X9cmy",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/72f988bf-86f1-41af-91ab-2d7cd011db47/oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"}
dbutils.fs.mount(
source = "abfss://sample11@utrolicstorage11.dfs.core.windows.net/",
mount_point = "/mnt/sampledata11",
extra_configs = configs)
Once mounted, you can use below code to list the files in mounted location.
dbutils.fs.ls("/mnt/sampledata11/")
And finally use with open
statement to read the file
with open("/dbfs/mnt/sampledata11/movies.csv", mode='rb') as file: # b is important -> binary
fileContent = file.read()
print(fileContent)
Check the image below for the complete implementation and the outputs below.