Is there anything in the Python API that lets you alter the artifact subdirectories? For example, I have a .json file stored here:
s3://mlflow/3/1353808bf7324824b7343658882b1e45/artifacts/feature_importance_split.json
MlFlow creates a 3/
key in s3. Is there a way to change to modify this key to something else (a date or the name of the experiment)?
As I commented above, yes, mlflow.create_experiment()
does allow you set the artifact location using the artifact_location
parameter.
However, sort of related, the problem with setting the artifact_location
using the create_experiment()
function is that once you create a experiment, MLflow will throw an error if you run the create_experiment()
function again.
I didn't see this in the docs but it's confirmed that if an experiment already exists in the backend-store, MlFlow will not allow you to run the same create_experiment()
function again. And as of this post, MLfLow does not have check_if_exists
flag or a create_experiments_if_not_exists()
function.
To make things more frustrating, you cannot set the artifcact_location
in the set_experiment()
function either.
So here is a pretty easy work around, it also avoids the "ERROR mlflow.utils.rest_utils..." stdout logging as well. :
import os
from random import random, randint
from mlflow import mlflow,log_metric, log_param, log_artifacts
from mlflow.exceptions import MlflowException
try:
experiment = mlflow.get_experiment_by_name('oof')
experiment_id = experiment.experiment_id
except AttributeError:
experiment_id = mlflow.create_experiment('oof', artifact_location='s3://mlflow-minio/sample/')
with mlflow.start_run(experiment_id=experiment_id) as run:
mlflow.set_tracking_uri('http://localhost:5000')
print("Running mlflow_tracking.py")
log_param("param1", randint(0, 100))
log_metric("foo", random())
log_metric("foo", random() + 1)
log_metric("foo", random() + 2)
if not os.path.exists("outputs"):
os.makedirs("outputs")
with open("outputs/test.txt", "w") as f:
f.write("hello world!")
log_artifacts("outputs")
If it is the user's first time creating the experiment, the code will run into an AttributeError since experiment_id
does not exist and the except
code block gets executed creating the experiment.
If it is the second, third, etc the code is run, it will only execute the code under the try
statement since the experiment now exists. Mlflow will now create a 'sample' key in your s3 bucket. Not fully tested but it works for me at least.