Looking for a way using Azure files SDK to upload files to my azure databricks blob storage
I tried many things using function from this page
But nothing worked. I don't understand why
example:
file_service = FileService(account_name='MYSECRETNAME', account_key='mySECRETkey')
generator = file_service.list_directories_and_files('MYSECRETNAME/test') #listing file in folder /test, working well
for file_or_dir in generator:
print(file_or_dir.name)
file_service.get_file_to_path('MYSECRETNAME','test/tables/input/referentials/','test.xlsx','/dbfs/FileStore/test6.xlsx')
with test.xlsx = name of file in my azure file
/dbfs/FileStore/test6.xlsx => path where to upload the file in my dbfs system
I have the error message:
Exception=The specified resource name contains invalid characters
Tried to change the name but doesn't seem to work
edit: I'm not even sure the function is doing what I want. What is the best way to load file from azure files?
Per my experience, I think the best way to load file from Azure Files is directly to read a file via its url with sas token.
For example, as the figures below, it's a file named test.xlsx
in my test
file share, that I viewed it using Azure Storage Explorer, then to generate its url with sas token.
Fig 1. Right click the file and then click the Get Shared Access Signature...
Fig 2. Must select the option Read
permission for directly reading the file content.
Fig 3. Copy the url with sas token
Here is my sample code, you can run it with the sas token url of your file in your Azure Databricks.
import pandas as pd
url_sas_token = 'https://<my account name>.file.core.windows.net/test/test.xlsx?st=2020-01-27T10%3A16%3A12Z&se=2020-01-28T10%3A16%3A12Z&sp=rl&sv=2018-03-28&sr=f&sig=XXXXXXXXXXXXXXXXX'
# Directly read the file content from its url with sas token to get a pandas dataframe
pdf = pd.read_excel(url_sas_token )
# Then, to convert the pandas dataframe to a PySpark dataframe in Azure Databricks
df = spark.createDataFrame(pdf)
Alternatively, to use Azure File Storage SDK to generate the url with sas token for your file or to get the bytes of your file for reading, please refer to the offical document Develop for Azure Files with Python
and my sample code below.
# Create a client of Azure File Service as same as yours
from azure.storage.file import FileService
account_name = '<your account name>'
account_key = '<your account key>'
share_name = 'test'
directory_name = None
file_name = 'test.xlsx'
file_service = FileService(account_name=account_name, account_key=account_key)
To generate the sas token url of a file
from azure.storage.file import FilePermissions
from datetime import datetime, timedelta
sas_token = file_service.generate_file_shared_access_signature(share_name, directory_name, file_name, permission=FilePermissions.READ, expiry=datetime.utcnow() + timedelta(hours=1))
url_sas_token = f"https://{account_name}.file.core.windows.net/{share_name}/{file_name}?{sas_token}"
import pandas as pd
pdf = pd.read_excel(url_sas_token)
df = spark.createDataFrame(pdf)
Or using get_file_to_stream
function to read the file content
from io import BytesIO
import pandas as pd
stream = BytesIO()
file_service.get_file_to_stream(share_name, directory_name, file_name, stream)
pdf = pd.read_excel(stream)
df = spark.createDataFrame(pdf)