I'm using Azure Machine Learning Studio and I have an sklearn mlflow
model stored in my default datastore (blob storage) which I have then registered as a model asset. How can I load this model inside an interactive notebook to perform some quick model inferencing and testing before deploying this as a batch endpoint.
I have seen a post linked here that suggests downloading the model artefacts locally but I shouldn't need to do this. I should be able to load the model directly from the datastore or the registered asset without the need to duplicate the model in multiple places. I have tried the following without success.
Reading from Registered Model Asset
import mlflow
from azure.ai.ml import MLClient
from azure.ai.ml.entities import Model
ml_client = MLClient(DefaultAzureCredential(), "<subscription_id>", "<resource_group>", "<workspace_id>")
model = ml_client.models.get("<model_name>", version="1")
loaded_model = mlflow.sklearn.load_model(model.id)
>>> OSError: No such file or directory: ...
Reading from Datastore
import mlflow
model_path = "<datastore_uri_to_model_folder>"
loaded_model = mlflow.sklearn.load_model(model_path)
>>> DeserializationError: Cannot deserialize content-type: text/html
According to this documentation any one of the paths should be given for loading the model.
/Users/me/path/to/local/model
relative/path/to/local/model
s3://my_bucket/path/to/model
runs:/<mlflow_run_id>/run-relative/path/to/model
models:/<model_name>/<model_version>
models:/<model_name>/<stage>
But you are giving model id and datastore path which is not supported.
So, try this code.
loaded_model = mlflow.sklearn.load_model("models:/local-mlflow-example/1")
loaded_model.predict(sample_data["data"])
Output:
Here the path should be models:/<model_name>/<model_version>
model.id
or model.path
or datastore path is supported when using azure ml jobs context.
So, to use model.id
or model.path
you submit the command job like below.
from azure.ai.ml import command
from azure.ai.ml.entities import Model
from azure.ai.ml.constants import AssetTypes
from azure.ai.ml import Input, Output
inputs = {
"input_data": Input(
type=AssetTypes.URI_FILE, path="./mlflow-model/input_example.json"
),
"input_model": Input(type=AssetTypes.MLFLOW_MODEL, path=model.path),
}
outputs = {
"output_folder": Output(
type=AssetTypes.URI_FOLDER,
path=f"azureml://subscriptions/{subscription_id}/resourcegroups/{resource_group}/workspaces/{workspace}/datastores/workspaceblobstore/paths/predictions",
)
}
job = command(
code="./src", # local path where the code is stored
command="python load_score.py --input_model ${{inputs.input_model}} --input_data ${{inputs.input_data}} --output_folder ${{outputs.output_folder}}",
inputs=inputs,
outputs=outputs,
environment="AzureML-sklearn-1.0-ubuntu20.04-py38-cpu:1",
compute="cpu-cluster",
)
# submit the command
returned_job = ml_client.jobs.create_or_update(job)
# get a URL for the status of the job
returned_job.studio_url
load_score.py
script which loads the model and prints the result.
import argparse
import pandas as pd
import mlflow.sklearn
import pandas as pd
import json
import os
parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
parser.add_argument("--input_model", type=str)
parser.add_argument("--output_folder", type=str)
args = parser.parse_args()
with open(args.input_data) as f:
sample_data = json.load(f)
f.close()
print(sample_data)
sk_model = mlflow.sklearn.load_model(args.input_model)
predictions = sk_model.predict(sample_data["data"])
# Writing to stdout
print(predictions)
with open(os.path.join(args.output_folder, "predictions.txt"), "x") as output:
# Writing data to a file
output.write(str(predictions))
output.close()
Job output:
Refer this notebook for more information.
If you face no such file error that is because of the dependencies required is not present like MLmodel
or model.pkl
etc is deleted in storage account or moved to other folders.