I face the following situation:
We train our models within docker container, which is build by running a docker-compose file. I have implemented MLflow to work with docker-compose (by doing something similar to e.g. this post: https://towardsdatascience.com/deploy-mlflow-with-docker-compose-8059f16b6039), creating two more containers (one for the server and one for the postgresql backend).
However, the story doesn't end here. Our goal is to implement a full ML pipeline, which includes data creation, preprocessing steps and so on. I know, that ML projects is something which helps to create such pipeline. I have seen that it is designed to work with docker images (https://www.mlflow.org/docs/latest/projects.html), but I don't get it, how one could use it with docker-compose.
Could you help me in that by giving any tipps, guidelines, documentations, etc?
Or in general, any advice, how a full machine learning pipeline could be implemented using mlflow?
Thanks a lot!
I would suggest training models in a conda environment and only dockerizing for deployment. That way, you can debug model code from an IDE like Pycharm.
So,
conda create -n env_name
conda run -n env_name pip install requirements.txt
Here is how I do it, though it is probably more complicated than you need: https://github.com/bdzyubak/torch-control/blob/main/run_setup_all.py
MLflow works with model training natively, you just need to import and call autolog.
mlflow.autolog()
mlflow.set_experiment('Energy Use Forecasting')
with mlflow.start_run():
[your training code]
Then, you would use a single command to pull down a registered model from mlflow to build it.
https://mlflow.org/docs/latest/cli.html
mlflow models build-docker --model-uri "runs:/some-run-uuid/my-model" --name "my-image-name"
# Serve the model
docker run -p 5001:8080 "my-image-name"