dockercontinuous-deploymentdocker-swarm

Docker Swarm deploy - wait for service/container to be present


EDIT: This is now possible since v26 from 2024! see https://github.com/docker/cli/issues/373

I have a working swarm setup and rolling-updates deployment. As i have to execute some tasks after deployment (like database migrations) i added a "manager" service to the stack. this service is limited to the node-manager - so I always have a way to find it.

To get the current containerID I use this command:
export MANAGER_ID=$(docker --tls ps --filter label=com.docker.swarm.service.name=projectname-php-manager -q)

This works ... but not during deploy.

The stack deploy exits to soon (before the container is up) or even before the manager container gets updated. I also added a sleep 10 before getting the containerID but the results vary.

Is there a way to wait or to know when a specific service is deployed?

The full deploy looks like this (done in a gitlab-ci job - but this is not the root of the problem):

deploy:staging:
  variables:
    DOCKER_HOST: "tcp://swarm-manager.hostname.tld:2376"
    DOCKER_CERT_PATH: "/home/gitlab-runner/docker/swarm-manager.hostname.tld"
    VERSION_TAG: "$CI_COMMIT_TAG"
    MYSQL_PROD_PASSWORD: "$MYSQL_PROD_PASSWORD"
    SECRET_TOKEN: "$SECRET_TOKEN"
  script:
    - docker --tls stack deploy -c docker-compose.prod.yml project-name --with-registry-auth --prune
    - sleep 10
    - export MANAGER_ID=$(docker --tls ps --filter label=com.docker.swarm.service.name=project-name_php-manager -q)
    - docker --tls exec -t $MANAGER_ID bin/console doctrine:migrations:migrate --no-interaction --allow-no-migration
  stage: deploy
  environment:
    name: staging
    url: http://projectname.com
  only: [tags]
  cache: ~
  dependencies:
    - build:app
  tags:
    - deploy

Part from docker-compose.prod.yml:

php-manager:
    image: dockerhub.mydomain.tld/namespace/projectname/php:${VERSION_TAG}
    environment:
        DATABASE_URL: "mysql://projectname:${MYSQL_PROD_PASSWORD}@mysql:3306/projectname?charset=utf8mb4&serverVersion=5.7"
        APP_ENV: prod
        APP_SECRET: "${SECRET_TOKEN}"
        VERSION: "${VERSION_TAG}"
        REDIS_HOST: redis
    networks:
      - default
    deploy:
      placement:
        constraints: [node.role == manager]
      replicas: 1
      restart_policy:
        condition: on-failure

Solution

  • Docker stack deploy creates tasks which try to get the system to the state you desire. Sometimes tasks succeed, sometimes they fail and the orchestrator will generate new tasks until the system matches the state described in your yml files.

    The bad news: docker stack deploy does not support blocking until the state you desire is reached.

    Here some how to get the info you want using the docker cli and basic bash tools (which you can surely implement in a similar way in any other language)

    In bash you could do docker service ls --format '{{.ID}} {{.Name}}' | grep ${serviceName} to get the ServiceId of your service (its the first of the two words returned)

    according to the docs docker service ps does:

    List the tasks of one or more services

    Also it adds some information about the task 'current state' which is the information you care about.

    Then you use docker service ps ${ServiceId} --format '{{.CurrentState}} {{.Image}}' | grep Running.*${newImageName}

    If this command returns something there is a container running with your new image. Hurray :)

    I hope this introduces you to all the tools you need. Docker service ps is also helpfull for finding out why a task failed.

    FYI: The possible values of task state according to the Swarm task states documentation are:

    NEW The task was initialized.

    PENDING Resources for the task were allocated.

    ASSIGNED Docker assigned the task to nodes.

    ACCEPTED The task was accepted by a worker node. If a worker node rejects the task, the state changes to REJECTED.

    PREPARING Docker is preparing the task.

    STARTING Docker is starting the task.

    RUNNING The task is executing.

    COMPLETE The task exited without an error code.

    FAILED The task exited with an error code.

    SHUTDOWN Docker requested the task to shut down.

    REJECTED The worker node rejected the task.

    ORPHANED The node was down for too long.