mlflowazureml-python-sdk

AzureML not finding DBFS path when registering a model locally


I have a set of ML models that have been tracked and registered using mlflow in Databricks that I want to register on AzureML. Model .pkl files are stored on DBFS and when I run the below code in a Databricks notebook it works as expected. However, when I execute the same code from my local machine, azureml can't find the model path, supposedly searching the local project paths:

azureml.exceptions._azureml_exception.WebserviceException: WebserviceException: Message: Error, provided model path "/dbfs/FileStore/path/to/model/model.pkl" cannot be found InnerException None ErrorResponse { "error": { "message": "Error, provided model path "/dbfs/FileStore/path/to/model/model.pkl" cannot be found" } }

This happens regardless of whether I run the code with pyspark and databricks-connect or not. What's the best way of pointing azureml at the correct dbfs storage? Thanks

# in Databricks, I run %pip install mlflow==2.1.1 azureml-sdk[databricks] azureml-mlflow

import json
import mlflow
import os
from azureml.core import Workspace, Experiment, Run
from azureml.core.model import Model
from azureml.core.authentication import ServicePrincipalAuthentication

top_run_id = 123456789
top_run_dict = mlflow.get_run(top_run_id).to_dictionary()


### connect to AZURE ML
subscription_id = '<my-subscription-id>'
resource_group = '<my-resource-group>'
workspace_name = '<my-workspace-name>'

## set up the AML-DB workspace communication
svc_pr = ServicePrincipalAuthentication(
    tenant_id='<my-tenant-id>',
    service_principal_id='<my-service-principal-id>',
    service_principal_password='<my-service-principal-password>'
    )

ws = Workspace(
         subscription_id=subscription_id,
         resource_group=resource_group,
         workspace_name=workspace_name,
         auth=svc_pr
)

model_name = 'mlflow_local_azureml_test'
model_uri = json.loads(mlflow.get_run(top_run_id).data.tags['mlflow.log-model.history'])[0]['flavors']['python_function']['artifacts']['model_path']['uri']
model_description = 'Dummy model'
model_tags = {
  "Type": "RandomForest",
  "Run ID": top_run_id,
  "Metrics": mlflow.get_run(top_run_id).data.metrics
}

registered_model = Model.register(
  model_path=model_uri, 
  model_name=model_name,
  tags=model_tags,
  description=model_description,
  workspace=ws
)

Solution

  • The path "/dbfs/FileStore/path/to/model/model.pkl" is a Databricks File System (DBFS) path, which is not accessible from outside of Databricks. When you run the code from your local machine, you need to use a path that is accessible from your local machine. One way to do this is to download the model file from DBFS to your local machine and then use the local path to the downloaded file for registering the model.

    1. You can use the Databricks CLI to download the file.

    First, you need to install the databricks-cli package:

    pip install databricks-cli
    

    Then, configure the Databricks CLI with your Databricks workspace URL and access token:

    bash
    databricks configure --token
    

    This prompt to add host and token, add them.

    enter image description here

    Below is the model path in databricks.

    enter image description here

    databricks fs ls dbfs:/FileStore/tables/
    

    Below is the list when above command is executed.

    enter image description here

    Know copy it to local system.

    databricks fs cp dbfs:/FileStore/path/to/model/model.pkl ./model.pkl
    

    enter image description here

    update the model_uri variable in your code to point to the local file:

    model_uri = "./model.pkl"
    

    when you run the code from your local machine, it should be able to find the model file and register it with AzureML.