pythondjangopostgresqlcelerypgbouncer

Celery crashes when PgBouncer closes idle connections (idle timeouts enabled)


I’m encountering an issue when running Celery with PgBouncer and PostgreSQL after enabling idle connection timeouts.

My stack includes:

Due to a large number of idle database connections caused by Tornado + Django, I introduced idle timeout settings to protect PostgreSQL from running out of connections.

PgBouncer

idle_transaction_timeout=240 (4mins)
client_idle_timeout=240

PostgreSQL

idle_in_transaction_session_timeout=300000 (5mins)
idle_session_timeout=300000 (5mins)

Problem:

After applying these settings, Celery occasionally crashes with the following error:

[2025-12-16 06:12:01,578: ERROR/MainProcess] Unrecoverable error: DatabaseError('client_idle_timeout\nserver closed the connection unexpectedly\n\tThis probably means the server terminated abnormally\n\tbefore or while processing the request.\n',)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/celery/worker/__init__.py", line 351, in start
    component.start()
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 393, in start
    self.consume_messages()
  File "/usr/local/lib/python2.7/site-packages/celery/worker/consumer.py", line 885, in consume_messages
    self.connection.drain_events(timeout=10.0)
  File "/usr/local/lib/python2.7/site-packages/kombu/connection.py", line 276, in drain_events
    return self.transport.drain_events(self.connection, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/virtual/__init__.py", line 760, in drain_events
    item, channel = get(timeout=timeout)
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/virtual/scheduling.py", line 39, in get
    return self.fun(resource, **kwargs), resource
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/virtual/__init__.py", line 780, in _drain_channel
    return channel.drain_events(timeout=timeout)
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/virtual/__init__.py", line 578, in drain_events
    return self._poll(self.cycle, timeout=timeout)
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/virtual/__init__.py", line 287, in _poll
    return cycle.get()
  File "/usr/local/lib/python2.7/site-packages/kombu/transport/virtual/scheduling.py", line 39, in get
    return self.fun(resource, **kwargs), resource
  File "/usr/local/lib/python2.7/site-packages/djkombu/transport.py", line 31, in _get
    m = Queue.objects.fetch(queue)
  File "/usr/local/lib/python2.7/site-packages/djkombu/managers.py", line 18, in fetch
    queue = self.get(name=queue_name)
  File "/usr/local/lib/python2.7/site-packages/django/db/models/manager.py", line 132, in get
    return self.get_query_set().get(*args, **kwargs)
  File "/usr/local/lib/python2.7/site-packages/django/db/models/query.py", line 344, in get
    num = len(clone)
  File "/usr/local/lib/python2.7/site-packages/django/db/models/query.py", line 82, in __len__
    self._result_cache = list(self.iterator())
  File "/usr/local/lib/python2.7/site-packages/django/db/models/query.py", line 273, in iterator
    for row in compiler.results_iter():
  File "/usr/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 680, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/usr/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 735, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute
    return self.cursor.execute(query, args)
DatabaseError: client_idle_timeout
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

[2025-12-16 06:12:02,291: INFO/MainProcess] Celerybeat: Shutting down...

Questions:

Any guidance or best practices would be greatly appreciated. Thanks in advance!


Solution

  • We encountered this issue because database connections and transactions were being kept open for too long. When a connection remains idle, PgBouncer (or the database itself) may close it to protect resources and prevent excessive overhead from idle clients. This behavior is expected and normal.

    In our case, a Celery task opened a database transaction early, then performed long-running logic (for example, calling external APIs). By the time the task tried to commit the transaction at the end, the connection had already been terminated due to idle timeout, which caused the crash.

    This is very similar to the problem described here:
    Celery task fails midway while pulling data from a large database

    Solution:
    Rework the transaction lifecycle: