I have a set of ML models that have been tracked and registered using mlflow in Databricks that I want to register on AzureML. Model .pkl files are stored on DBFS and when I run the below code in a Databricks notebook it works as expected. However, when I execute the same code from my local machine, azureml can't find the model path, supposedly searching the local project paths:
azureml.exceptions._azureml_exception.WebserviceException: WebserviceException: Message: Error, provided model path "/dbfs/FileStore/path/to/model/model.pkl" cannot be found InnerException None ErrorResponse { "error": { "message": "Error, provided model path "/dbfs/FileStore/path/to/model/model.pkl" cannot be found" } }
This happens regardless of whether I run the code with pyspark and databricks-connect or not. What's the best way of pointing azureml at the correct dbfs storage? Thanks
# in Databricks, I run %pip install mlflow==2.1.1 azureml-sdk[databricks] azureml-mlflow
import json
import mlflow
import os
from azureml.core import Workspace, Experiment, Run
from azureml.core.model import Model
from azureml.core.authentication import ServicePrincipalAuthentication
top_run_id = 123456789
top_run_dict = mlflow.get_run(top_run_id).to_dictionary()
### connect to AZURE ML
subscription_id = '<my-subscription-id>'
resource_group = '<my-resource-group>'
workspace_name = '<my-workspace-name>'
## set up the AML-DB workspace communication
svc_pr = ServicePrincipalAuthentication(
tenant_id='<my-tenant-id>',
service_principal_id='<my-service-principal-id>',
service_principal_password='<my-service-principal-password>'
)
ws = Workspace(
subscription_id=subscription_id,
resource_group=resource_group,
workspace_name=workspace_name,
auth=svc_pr
)
model_name = 'mlflow_local_azureml_test'
model_uri = json.loads(mlflow.get_run(top_run_id).data.tags['mlflow.log-model.history'])[0]['flavors']['python_function']['artifacts']['model_path']['uri']
model_description = 'Dummy model'
model_tags = {
"Type": "RandomForest",
"Run ID": top_run_id,
"Metrics": mlflow.get_run(top_run_id).data.metrics
}
registered_model = Model.register(
model_path=model_uri,
model_name=model_name,
tags=model_tags,
description=model_description,
workspace=ws
)
The path "/dbfs/FileStore/path/to/model/model.pkl"
is a Databricks File System (DBFS) path, which is not accessible from outside of Databricks. When you run the code from your local machine, you need to use a path that is accessible from your local machine. One way to do this is to download the model file from DBFS to your local machine and then use the local path to the downloaded file for registering the model.
1. You can use the Databricks CLI to download the file.
First, you need to install the databricks-cli
package:
pip install databricks-cli
Then, configure the Databricks CLI with your Databricks workspace URL and access token:
bash
databricks configure --token
This prompt to add host and token, add them.
Below is the model path in databricks.
databricks fs ls dbfs:/FileStore/tables/
Below is the list when above command is executed.
Know copy it to local system.
databricks fs cp dbfs:/FileStore/path/to/model/model.pkl ./model.pkl
update the model_uri
variable in your code to point to the local file:
model_uri = "./model.pkl"
when you run the code from your local machine, it should be able to find the model file and register it with AzureML.