azureazure-blob-storageazure-machine-learning-serviceblobstorage

Azure Blob Storage (WASBS) - How to get permission to access?


I would like to parse a text file on my blob storage container, but I get an error message which says: permission denied when access stream.

I guess I have to use an Access keys for my storage account, but how can I do this in the code?

My program works when using a public blob container from microsoft learn.

# Auth
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import AmlCompute
from azure.ai.ml import UserIdentityConfiguration
from azure.ai.ml import MLClient, command, Input
from azure.ai.ml.constants import AssetTypes, InputOutputModes
import pandas as pd

from azure.storage.blob import BlobServiceClient

ml_client = MLClient.from_config(credential=DefaultAzureCredential())

gpu_compute_target = "gpu-cluster"

curated_env_name = "TensorFlow_Train_Env:2"

blob_csv_path = "wasbs://mycontainername@mystorageaccount.blob.core.windows.net/read1.txt"

job = command(
    inputs={
        "csv_file" : Input(type=AssetTypes.URI_FILE, path=blob_csv_path, mode=InputOutputModes.RO_MOUNT),
    },
    compute=gpu_compute_target,
    environment=curated_env_name,
    code="./src/",
    command="python test2.py --data-file ${{inputs.csv_file}}",
    experiment_name="tf-test-expname",
    display_name="tensorflow-test_displayname",
)

ml_client.jobs.create_or_update(job)

Solution

  • To read a CSV from blob, you can also give a path using either the https:// protocol or the azureml:// protocol.

    First, create a datastore using the storage account where your CSV is located.

    Go to Data > Datastores > Create.

    enter image description here

    Next, give a name to the datastore and provide the access key.

    enter image description here

    Click on create.

    After creating, browse to your CSV file.

    enter image description here

    and click copy uri. You will be prompted with 2 kinds of URIs.

    enter image description here

    URI

    enter image description here

    Select any one of them and pass that path to your job. This datastore can be used for any data present in your blob storage, with no need to configure every time.