python-3.xazure-databricksrasterioazure-data-lake-gen2

Read Azure Datalake Gen2 images from Azure Databricks


Am working on .tif files stored in Azure Data Lake Gen2. Want to open this files using rasterio from Azure Databricks.

Example:

when reading the image file from Data Lake as spark.read.format("image").load(filepath) works fine.

enter image description here

But trying to open same as

with rasterio.open(filepath) as src:
    print(src.profile)

getting error:

RasterioIOError: wasbs://xxxxx.blob.core.windows.net/xxxx_2016/xxxx_2016.tif: No such file or directory

Any clues what am doing wrong?

Update:

As suggest by Axel R, mounted files on Databricks file system but still getting same issue and cannot open the file from rasterio, but can read as df.

enter image description here

Also tried by created shared access signature to the file in Datalake and tried to access the file through URI. Now getting error below error:

CURL error: error setting certificate verify locations:   CAfile: /etc/pki/tls/certs/ca-bundle.crt   CApath: none

To test further tried to open a sameple file from web which is @

filepath = 'http://landsat-pds.s3.amazonaws.com/c1/L8/042/034/LC08_L1TP_042034_20170616_20170629_01_T1/LC08_L1TP_042034_20170616_20170629_01_T1_B4.TIF' works fine


Solution

  • I believe it is because rasterio is using the Local APIs and can only read from a path that starts with /dbfs/.

    Is it possible for you to mount the blob storage ? That would allow you to access it with rasterio with a path starting with /dbfs/mnt/