I'm a bit new to docker and trying to simulate a cluster environment with it. I have defined a custom docker network that the containers share, and assign each container to a different port to simulate different network cards.
Currently, I have a working Dockerfile
that copies over the needed ssh keys and I automatically have it start the ssh server with ENTRYPOINT service ssh start && bash
.
Right now, my containers work, but the inconvenience is that when the containers start I have to manually run eval
ssh-agent && ssh-add /.ssh/docker_id_rsa,
then manually ssh into all the other containers, and then I am able to run my MPI program. If I don't do these steps first, I am not able to run the program across the containers.
So what I'd like to do is when I attach to one of the containers, I want to either (1) immediately run my MPI program across all of the containers without having to run all the steps I mentioned above, or (2) even just immediately ssh into the other containers, and then run my program.
Here is an example of my current Dockerfile
:
FROM img_base AS img
COPY /keys/ /root/.ssh
COPY /keys/docker_id_rsa.pub /root/.ssh/authorized_keys
RUN sed -i 's/#PermitRootLogin no/PermitRootLogin yes/g' /etc/ssh/sshd_config
RUN sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config
RUN sed -i "s+StrictHostKeyChecking .*+StrictHostKeyChecking allow-new+" /etc/ssh/sshd_config
RUN echo "localhost" >> hostfile
RUN echo "root@container2" >> hostfile
RUN echo "root@container3" >> hostfile
RUN echo "root@container4" >> hostfile
EXPOSE 22
ENTRYPOINT service ssh start && bash && eval `ssh-agent` && ssh-add /root/.ssh/docker_id_rsa
I start my containers with the following bash script:
#!/bin/bash
docker run --rm -dit --name container1 --network=my-net --ip=172.18.0.2 -p 4022:22 --add-host container2:172.18.0.3 --add-host container3:172.18.0.4 --add-host container4:172.18.0.5 img
docker run --rm -dit --name container2 --network=my-net --ip=172.18.0.3 -p 3022:22 --add-host container1:172.18.0.2 --add-host container3:172.18.0.4 --add-host container4:172.18.0.5 img
docker run --rm -dit --name container3 --network=my-net --ip=172.18.0.4 -p 5022:22 --add-host container2:172.18.0.3 --add-host container1:172.18.0.2 --add-host container4:172.18.0.5 img
docker run --rm -dit --name container4 --network=my-net --ip=172.18.0.5 -p 6022:22 --add-host container2:172.18.0.3 --add-host container3:172.18.0.4 --add-host container1:172.18.0.2 img
docker attach container1
I have tried adding the eval
and ssh-add
commands in the ENTRYPOINT
command.
I've also tried adding these commands to the docker run
commands in the bash script.
And I've tried to do this with a docker-compose file but still do not really understand how to use the docker-compose functionalities
Any advice or references on the proper way to do this is greatly appreciated.
I'm not sure what your img_base
looks like but I'll just assume that it's an Ubuntu image (or a derivative).
You are setting up SSH access to the containers as the root
user. This is not ideal but 100% fine to get things up and running. Perhaps change to a non-privileged user later?
🗎 Dockerfile
FROM ubuntu:22.04 AS img
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y openssh-server && \
mkdir /var/run/sshd
RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config
COPY keys/docker_id_rsa.pub /root/.ssh/authorized_keys
RUN chmod 700 /root/.ssh && chmod 600 /root/.ssh/authorized_keys
CMD ["/usr/sbin/sshd", "-D"]
Testing the image. Connecting port 2022 on the host to avoid conflict with SSHD running on host.
SSH connection confirmed. ✅
Now let's get this working with Docker Compose.
🗎 Dockerfile
FROM ubuntu:22.04 AS img
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y openssh-server && \
mkdir /var/run/sshd
# SSH server configuration.
RUN sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config
# SSH client configuration.
RUN echo " StrictHostKeyChecking no" >> /etc/ssh/ssh_config
RUN echo " UserKnownHostsFile /dev/null" >> /etc/ssh/ssh_config
COPY keys /root/.ssh/
COPY keys/docker_id_rsa.pub /root/.ssh/authorized_keys
RUN chmod 700 /root/.ssh && chmod 600 /root/.ssh/authorized_keys
COPY setup.sh .
RUN chmod +x /setup.sh
CMD ["/usr/sbin/sshd", "-D"]
🗎 docker-compose.yml
version: '3.7'
x-common-service: &common-service-template
build:
context: .
dockerfile: Dockerfile
networks:
- my-net
services:
container1:
<<: *common-service-template
container_name: container1
ports:
- "4022:22"
command: /bin/bash -c "/setup.sh"
container2:
<<: *common-service-template
container_name: container2
ports:
- "3022:22"
container3:
<<: *common-service-template
container_name: container3
ports:
- "5022:22"
container4:
<<: *common-service-template
container_name: container4
ports:
- "6022:22"
networks:
my-net:
The container1
service is slightly different because it runs the setup.sh
script. This script (see below) will run code on the other three containers via SSH. So you can use this to set up all of the containers. For the moment though it just prints a message on each of the containers.
🗎 setup.sh
#!/bin/bash
echo "* Setting up cluster."
ssh -i ~/.ssh/docker_id_rsa root@container2 'echo "- Running code on container2! $(hostname)"'
ssh -i ~/.ssh/docker_id_rsa root@container3 'echo "- Running code on container3! $(hostname)"'
ssh -i ~/.ssh/docker_id_rsa root@container4 'echo "- Running code on container4! $(hostname)"'
echo "* Done!"
/usr/sbin/sshd -D
Launch.
docker-compose build && docker-compose up
So container1
is effectively acting as the master and setting things up on the other containers.