mlflow

Getting artifacts from a registered model in mlflow


I'm learning mlflow, primarily for tracking my experiments now, but in the future more as a centralized model db where I could update a model for a certain task and deploy the updated version locally without changing the API.

In my problem the inference data needs some processing before passing it to ML models, and parameters for the processing are a part of model development. So when I want to do inference I need to retrieve these parameters to prepare input to the model. At the moment I attach these parameters as JSON to MLflow runs but when I register the model it doesn't seem to included.

Is there any streamlined way of doing it? I'm doing all thing locally ATM (and registering the chosen model through UI) but I want to make it robust when I move to MLflow server

At the moment I found that I can go from registered model through metadata.run_id to fetch this artifact, but is there a better way?

model_URI = "models:/foo"
model = mlflow.pyfunc.load_model(model_URI)
runID = model.metadata.run_ID
params_path = "runs:/" + run_ID + "/params.json"
params = mlflow.artifacts.load_dict(params_pat)

Solution

  • you are experiencing the very design of MLFlow: separation of metadata (like runs and params) and artefacts (like models with weights). You are expected to link that information with run ids, similarly to what you proposed. However, use API to handle artefact paths, avoiding manual creation as much as possible. In your case:

    artifact_uri = mlflow.get_run(run_id).info.artifact_uri
    mlflow.artifacts.load_dict(artifact_uri + "/params.json")
    # {'C': 0.1, 'random_state': 42}
    

    As for the run id, you know it from the model

    # get the run id of the most recent model 
    client = MlflowClient(mlflow.get_tracking_uri())
    model_info = client.get_latest_versions('toy-model')[0]
    run_id = model_info.run_id
    

    PS: MLFlow is great :-)