dockermlflowdocker-in-docker

MLflow run within a docker container - Running with "docker_env" in MLflow project file


We are trying to develop an MLflow pipeline. We have our developing environment in a series of dockers (no local python environment "whatsoever"). This means that we have set up a docker container with MLflow and all requirements necessary to run pipelines. The issue we have is that when we write our MLflow project file we need to use "docker_env" to specify the environment. This figure illustrates what we want to achieve:

MLflow run dind

MLflow inside the docker needs to access the docker daemon/service so that it can either use the "docker-image" in the MLflow project file or pull it from docker hub. We are aware of the possibility of using "conda_env" in the MLflow project file but wish to avoid this.

Our question is,

Do we need to set some sort of "docker in docker" solution to achieve our goal?

Is it possible to set up the docker container in which MLflow is running so that it can access the "host machine" docker daemon?

I have been all over Google and MLflow's documentation but I can seem to find anything that can guide us. Thanks a lot in advance for any help or pointers!


Solution

  • I managed to create my pipeline using docker and docker_env in MLflow. It is not necessary to run d-in-d, the "sibling approach" is sufficient. This approach is described here:

    https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/

    and it is the preferred method to avoid d-in-d.

    One needs to be very careful when mounting volumes within the primary and secondary docker environments: all volume mounts happen in the host machine.