I currently can access files within blob storage, however I do not know if its Gen2 or simple Blob Storage linked service. So I need to connect to ADLS Gen2, use SAS, service principal and managed identity to link up the service. How do I go about it with Python Notebook? I am newbie at this.
I was given this, does this work?
spark.conf.set("fs.azure.account.auth.type.account.dfs.core.windows.net", "SAS")
spark.conf.set("fs.azure.sas.token.provider.type.account.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")
spark.conf.set("fs.azure.sas.fixed.token.acount.dfs.core.windows.net", "token")
You can mount the required container using SAS or account key of the storage account to achieve your requirement.
As per this documentation,
Mounting the container using linked service is better in terms of security.
First Go to Manage -> linked services -> new -> Azure Data Lake gen2 to create the linked service. Here, you can choose the required authentication as per your requirement.
After creating the linked service, you can use code from the same documentation to mount the storage account container to the synapse notebook using ADLS gen2 linked service. The mounting code will return True
and then you can access the required files in the container.
You can use 'file:{mssparkutils.fs.getMountPath("/mount1")}'
to get the mount path. Here is a sample demo in which the code taken from the same documentation.
mssparkutils.fs.mount(
"abfss://con1@laddugen2.dfs.core.windows.net",
"/mount1",
{"linkedService": "My_ADLS_LS"}
)
df = spark.read.load(f'file:{mssparkutils.fs.getMountPath("/mount1")}/DS_STORAGE_2024.csv', format='csv')
df.show()
Output: