We are using Symfony Messenger in combination with supervisor running in a Docker container on AWS ECS. We noticed the worker is not shut down gracefully. After debugging it appears it does work as expected when using APP_ENV=dev
, but not when APP_ENV=prod
.
I made a simple sleepMessage, which sleeps for 1 second and then prints a message for 60 seconds. This is when running with APP_ENV=dev
As you can see it's clearly waiting for the program to stop running.
Now with APP_ENV=prod
:
It stops immediately without waiting.
In the Dockerfile we have configured the following to start supervisor. It's based on php:8.1-apache
, so that's why STOPSIGNAL has been configured
RUN apt-get update && apt-get install -y --no-install-recommends \
# for supervisor
python \
supervisor
The start-worker.sh script contains this
#!/usr/bin/env bash
cp config/worker/messenger-worker.conf ../../../etc/supervisor/supervisord.conf
exec /usr/bin/supervisord
We do this because certain env variables are only available when starting up. For debugging purposes the config has been hardcoded to test. Below is the messenger-worker.conf
[unix_http_server]
file=/tmp/supervisor.sock
[supervisord]
nodaemon=true ; start in foreground if true; default false
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface
[program:messenger-consume]
stderr_logfile_maxbytes=0
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
command=bin/console messenger:consume async -vv --env=prod --time-limit=129600
process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
numprocs=1
environment=
MESSENGER_TRANSPORT_DSN="https://sqs.eu-central-1.amazonaws.com/{id}/dev-
symfony-messenger-queue"
So in short, when using --env=prod
in the config above it doesn't wait for the worker to stop, while with --env=dev
it does. Does anybody know how to solve this?
Turns out it was related to the wait_time
option related to SQS transports. It probably caused a request that was started just before the container exited and was sent back when the container did not exist anymore. So, wait_time
to 0 fixed that problem.
Then there was this which could lead to the same issue