I’m encountering an issue when running Celery with PgBouncer and PostgreSQL after enabling idle connection timeouts.
My stack includes:
Django (served via Tornado)
Celery (workers + beat)
PostgreSQL
PgBouncer (in front of PostgreSQL)
Due to a large number of idle database connections caused by Tornado + Django, I introduced idle timeout settings to protect PostgreSQL from running out of connections.
PgBouncer
idle_transaction_timeout=240 (4mins)
client_idle_timeout=240
PostgreSQL
idle_in_transaction_session_timeout=300000 (5mins)
idle_session_timeout=300000 (5mins)
Problem:
After applying these settings, Celery occasionally crashes with the following error:
[2025-12-16 06:12:01,578: ERROR/MainProcess] Unrecoverable error: DatabaseError('client_idle_timeout\nserver closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request.\n',)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/celery/worker/__init__.py", line 351, in start
component.start()
File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 393, in start
self.consume_messages()
File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 885, in consume_messages
self.connection.drain_events(timeout=10.0)
File "/usr/local/lib/python2.7/site-packages/kombu/connection.py", line 276, in drain_events
return self.transport.drain_events(self.connection, **kwargs)
File "/usr/local/lib/python2.7/site-packages/kombu/transport/virtual/__init__.py", line 760, in drain_events
item, channel = get(timeout=timeout)
File "/usr/local/lib/python2.7/site-packages/kombu/transport/virtual/scheduling.py", line 39, in get
return self.fun(resource, **kwargs), resource
File "/usr/local/lib/python2.7/site-packages/kombu/transport/virtual/__init__.py", line 780, in _drain_channel
return channel.drain_events(timeout=timeout)
File "/usr/local/lib/python2.7/site-packages/kombu/transport/virtual/__init__.py", line 578, in drain_events
return self._poll(self.cycle, timeout=timeout)
File "/usr/local/lib/python2.7/site-packages/kombu/transport/virtual/__init__.py", line 287, in _poll
return cycle.get()
File "/usr/local/lib/python2.7/site-packages/kombu/transport/virtual/scheduling.py", line 39, in get
return self.fun(resource, **kwargs), resource
File "/usr/local/lib/python2.7/site-packages/djkombu/transport.py", line 31, in _get
m = Queue.objects.fetch(queue)
File "/usr/local/lib/python2.7/site-packages/djkombu/managers.py", line 18, in fetch
queue = self.get(name=queue_name)
File "/usr/local/lib/python2.7/site-packages/django/db/models/manager.py", line 132, in get
return self.get_query_set().get(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/django/db/models/query.py", line 344, in get
num = len(clone)
File "/usr/local/lib/python2.7/site-packages/django/db/models/query.py", line 82, in __len__
self._result_cache = list(self.iterator())
File "/usr/local/lib/python2.7/site-packages/django/db/models/query.py", line 273, in iterator
for row in compiler.results_iter():
File "/usr/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 680, in results_iter
for rows in self.execute_sql(MULTI):
File "/usr/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 735, in execute_sql
cursor.execute(sql, params)
File "/usr/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute
return self.cursor.execute(query, args)
DatabaseError: client_idle_timeout
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[2025-12-16 06:12:02,291: INFO/MainProcess] Celerybeat: Shutting down...
Questions:
Is this a known issue when using Celery with PgBouncer idle timeouts?
Are these timeout values incompatible with long-running Celery workers?
What is the recommended way to configure PgBouncer/PostgreSQL idle timeouts when Celery is involved?
Any guidance or best practices would be greatly appreciated. Thanks in advance!
We encountered this issue because database connections and transactions were being kept open for too long. When a connection remains idle, PgBouncer (or the database itself) may close it to protect resources and prevent excessive overhead from idle clients. This behavior is expected and normal.
In our case, a Celery task opened a database transaction early, then performed long-running logic (for example, calling external APIs). By the time the task tried to commit the transaction at the end, the connection had already been terminated due to idle timeout, which caused the crash.
This is very similar to the problem described here:
Celery task fails midway while pulling data from a large database
Solution:
Rework the transaction lifecycle:
Open database transactions only when they are actually needed
Keep transactions as short as possible
Avoid wrapping long-running or external operations inside a database transaction