azure-active-directoryazure-machine-learning-serviceazure-managed-identityazure-data-lake-gen2

how to give an azure ML compute cluster access to data lake gen2 storage?


when i try to run an azure ML pipeline that has inside it a connection to datalake storage using compute cluster (python sdk v2), i get an error that the compute cluster doesnt have access to the storage but when i run the pipeline using compute instance that is assigned to me i dont have any problem but im trying to do automation using synapse pipeline so i need to use a compute cluster not compute instance.

i tried to give the azure ml workspace owner access to data lake gen2 storage but that didnt fix the problem,i tried to use managed identity on the cluster and i got a principal id but when i try to add that given principal id to datalake storage in IAM access i cant enter a principal id in it i only can write a user name


Solution

  • Follow the steps below to grant access to your cluster using managed identity.

    1. Copy the principal ID you obtained when creating the compute cluster.

    enter image description here

    1. Go to Enterprise applications in Microsoft Entra ID.

    enter image description here

    1. Search for the managed identity using the principal ID you copied earlier.

    enter image description here

    1. You will see the name and details of the managed identity. Copy its name and go to the ADLS Gen 2 account to grant access.

    Give Storage Blob Data Contributor or Storage Blob Data Reader based on your requirement.

    enter image description here