[SOLVED] Changing subdirectory of MLflow artifact store

Changing subdirectory of MLflow artifact store

Is there anything in the Python API that lets you alter the artifact subdirectories? For example, I have a .json file stored here:

s3://mlflow/3/1353808bf7324824b7343658882b1e45/artifacts/feature_importance_split.json

MlFlow creates a 3/ key in s3. Is there a way to change to modify this key to something else (a date or the name of the experiment)?

Solution

As I commented above, yes, mlflow.create_experiment() does allow you set the artifact location using the artifact_location parameter.

However, sort of related, the problem with setting the artifact_location using the create_experiment() function is that once you create a experiment, MLflow will throw an error if you run the create_experiment() function again.

I didn't see this in the docs but it's confirmed that if an experiment already exists in the backend-store, MlFlow will not allow you to run the same create_experiment() function again. And as of this post, MLfLow does not have check_if_exists flag or a create_experiments_if_not_exists() function.

To make things more frustrating, you cannot set the artifcact_location in the set_experiment() function either.

So here is a pretty easy work around, it also avoids the "ERROR mlflow.utils.rest_utils..." stdout logging as well. :

import os
from random import random, randint

from mlflow import mlflow,log_metric, log_param, log_artifacts
from mlflow.exceptions import MlflowException

try:
    experiment = mlflow.get_experiment_by_name('oof')
    experiment_id = experiment.experiment_id
except AttributeError:
    experiment_id = mlflow.create_experiment('oof', artifact_location='s3://mlflow-minio/sample/')

with mlflow.start_run(experiment_id=experiment_id) as run:
    mlflow.set_tracking_uri('http://localhost:5000')
    print("Running mlflow_tracking.py")

    log_param("param1", randint(0, 100))
    
    log_metric("foo", random())
    log_metric("foo", random() + 1)
    log_metric("foo", random() + 2)

    if not os.path.exists("outputs"):
        os.makedirs("outputs")
    with open("outputs/test.txt", "w") as f:
        f.write("hello world!")

    log_artifacts("outputs")

If it is the user's first time creating the experiment, the code will run into an AttributeError since experiment_id does not exist and the except code block gets executed creating the experiment.

If it is the second, third, etc the code is run, it will only execute the code under the try statement since the experiment now exists. Mlflow will now create a 'sample' key in your s3 bucket. Not fully tested but it works for me at least.