I am running Azure Databricks 4.3 (includes Apache Spark 2.3.1, Scala 2.11).
I copied a CSV
file from Azure Blob Storage into Databricks cluster using dbutils.fs.cp
into disk by adding file:
to the absolute local_path
:
copy_to = "file:" + local_path
dbutils.fs.cp(blob_storage_path, copy_to)
When I then try to read the file using the same path with file:
added in front:
csv_spark_df = sqlContext.read.format('csv').options(header='true', inferSchema='true').load(copy_to)
I am getting an error message denoting that the given path does not exist:
java.io.FileNotFoundException: File file:/<local_path>
When I mount the Azure Blob Storage container, as described below, then I can read the file correctly with Spark using the same snippet above, using the absolute local_path
of the file in the mounted directory:
Is it at all possible to read the CSV
file that was copied from the Azure Blob Storage or is the solution using mounting of the Azure Blob Storage container the preferred one anyway?
I'm not certain what the file: will map to.
I would have expected the path to be a DBFS path:
copy_to = "/path/file.csv"
This will be assumed to a DBFS path.
You can always do:
dbutils.fs.ls("/path")
To verify the file copy.
Though please note you do not need to copy the file to DBFS to load into a dataframe - you can read directly from the blob storage account. That would be the normal approach. Is there a reason you want to copy it locally?