When I attempt to create or save a table to a location in my Azure Datalake Gen 2 using example code:
%sql
CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "/mnt/training/ecommerce/events/events.parquet");
I get the error:
[RequestId=xxxx-xxxx-6789-8u98-de33192c16e0 ErrorClass=INVALID_PARAMETER_VALUE] GenerateTemporaryPathCredential uri /mnt/training/ecommerce/events is not a valid URI. Error message: INVALID_PARAMETER_VALUE: Missing cloud file system scheme.
I did some research and came across the following suggestion from Databricks:
In Databricks, when reading data from cloud storage like AWS S3, Azure data lake Storage, or Google Cloud Storage, you must include the scheme corresponding to the cloud storage.
I appreciate that is a correct suggestion, however I have mounted the ADLS Storage account with Databricks, and I'm able to read files from the ADLS account using df = spark.read.csv("/mnt/training/ecommerce/events/events.csv", inferSchema=True, header=True)
Any thoughts?
I attempted the solution suggested by @Bhavani, but unfortunately it didn't work. I added and a new External Location as suggested:
When I try and create a table using the mounted drive I get the same error:
Just to show the that drive is mounted see the below image:
The crazy thing is, I can read from the mounted drive, see image:
One last update to show you that the connection to ADLS Gen mounted drive is successful, see image. So, I'm not sure why I'm still getting the error:
INVALID_PARAMETER_VALUE] GenerateTemporaryPathCredential uri /mnt/files/Iris.parquet is not a valid URI. Error message: INVALID_PARAMETER_VALUE: Missing cloud file system scheme.
When working with Azure Data Lake Storage Gen2 in a Synapse or Databricks environment, you need to specify the scheme (abfss://
for ADLS Gen2). You have provided mounted path, that may be the reason to get above error. Instead of mounting ADLS Gen2 account you can follow below procedure:
Grant Storage blob data contributor role to Azure databricks managed identity. Go to Catalog page in databricks workspace click on + Select Add an external location as shown below:
Configure details while creating a new external location as shown below:
Create the external location. After successful creation of external location, you will be able to create table using below code:
%sql
CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "abfss://<containerName>@<storageAccountName>.dfs.core.windows.net/<filepath>");
You can query table successfully as shown below:
You can use below code to create table with mounted storage account:
%sql
CREATE TABLE events1 AS
SELECT
*
FROM
read_files(
'/mnt/<mountName>/<pathToParquetFile>/Iris.parquet',
header => "True"
);
It will create table successfully.