pythonazure-databricksmount-point

Azure Databricks save file in a mount point


guys. I'm using a personal compute cluster on Azure Databricks and proceeding with a mount point creation on as follows:

configs = {"fs.azure.account.auth.type": "OAuth",
          "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
          "fs.azure.account.oauth2.client.id": "xxxxxx",
          "fs.azure.account.oauth2.client.secret": "xxxxx",
          "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/xxxxxx/oauth2/token"}


dbutils.fs.mount(
  source = "abfss://data@mystorage.dfs.core.windows.net/xxx/yyy",
  mount_point = "/mnt/MyMount/",
  extra_configs = configs)

When I try to access the mount using dbutils.fs.ls("/mnt/MyMount/") it's accessible. But when I start to read with code as follows:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("MyApp").getOrCreate()
views = ['view_1', 'view_2', 'view_3', 'view_4', 'view_5']
logs = ['Log_views']

for view in views:
    print(f'Start save view {view}.')
    spark.read.format("delta").load(f"/mnt/source/{view}").toPandas().to_csv(f"/mnt/MyMount/{view}.csv", index=False)
    print(f'View {view} save sucessfuly.')

for log in logs:
    spark.read.format("parquet").load(f"/mnt/source/{log}").toPandas().to_csv(f"/mnt/MyMount/{log}.csv", index=False)
    print(f'Logs {log} save successfully.')

I got an error:

OSError: Cannot save file into a non-existent directory: '/mnt/MyMount'

I tried to remove and recreate the mount point and confirmed all access is working fine.


Solution

  • You need to include /dbfs/ in path. Because the path outside spark context need to be given from root filesystem and your mount will be under dbfs folder.

    Below is the data:

    enter image description here

    Code

    spark.read.format("parquet").load(f"/mnt/source/{log}").toPandas().to_csv(f"/dbfs/mnt/MyMount/{log}.csv", index=False)
    
    

    enter image description here

    And output:

    enter image description here

    For more information on working with files in databricks refer this