pythonmlflow

Continue stopped run in MLflow


We run our experiment on AWS spot instances. Sometimes the experiments are stopped, and we would prefer to continue logging to the same run. How can you set the run-id of the active run?

Something like this pseudocode (not working):

if new:
    mlflow.start_run(experiment_id=1, run_name=x)
else:
    mlflow.set_run(run_id)

Solution

  • You can pass the run_id directly to start_run:

    mlflow.start_run(experiment_id=1,
                     run_name=x,
                     run_id=<run_id_of_interrupted_run> # pass None to start a new run
                     ) 
    

    Of course, you have to store the run_id for this. You can get it with run.info.run_id