We created Delta Table in Azure Databricks. We have Parquet files being stored in Azure Storage. Is this data read being on public network or Microsoft network? As of now Azure Storage and Azure Databricks both are NOT in any our VNet. Adding both of them will improved read speed? Creating Private Endpoint on Azure Storage will ensure read through Microsoft network?
If your Azure Storage and Azure Databricks are not in any VNet, the data read is happening over the public network. To make sure that the data read happens over the Microsoft network, you can use Azure Private Link to create private endpoints for both Azure Storage and Azure Databricks.
Creating private endpoints will make sure that the traffic between Azure Databricks and Azure Storage remains within the Microsoft network, which can improve security and potentially improve read speed by avoiding the public internet.
ADLS Gen2 operates on a shared architecture. To securely access it from Azure Databricks, there are two available options:
You can choose either from the above approaches for Securing access between Azure Databricks (ADB) and ADLS Gen2 requires the ADB workspace to be VNet-injected, regardless of the approach used.
When a storage account is configured with a private endpoint, a firewall is enabled by default. To allow access, the VNet and subnets used by Databricks must be added to the firewall settings, as shown below.
After this you can mount the ADLS However, to read files from the folder, you also need to manage ACLs for both the container and the files.
The same can be done for files by right-clicking on the file that needs to be accessed from the Databricks notebook.
Know more how to Secure Access to Storage: Azure Databricks and Azure Data Lake Storage Gen2 Patterns
Deploy Azure Databricks in your Azure virtual network (VNet injection)