pythonazureazure-data-lakeazure-machine-learning-serviceml-studio

Mount a datalake storage in azure ML studio


I created a file dataset from a data lake folder on Azure ML Studio, at the moment I´m able to download the data from the dataset to the compute instance with this code:

subscription_id = 'xxx'
resource_group = 'luisdatapipelinetest'
workspace_name = 'ml-pipelines'
workspace = Workspace(subscription_id, resource_group, workspace_name)
dataset = Dataset.get_by_name(workspace, name='files_test')
path = "/mnt/batch/tasks/shared/LS_root/mounts/clusters/demo1231/code/Users/luis.rramirez/test/"
dataset.download(target_path=path, overwrite=True)

With that I'm able to access the files from the notebook.

enter image description here

But copying the data from the data lake to the compute instance is not efficient, how can I mount the data lake directory in the vm instead of copying the data each time?


Solution

  • MOUNTING ADLS2 to AML so you can save files into your mountPoint directly. Here is the example of registering the storage and here shows how to mount your registered datastore.