Let me tell you my structure and problem. I have sync and async containers on K8. These pods are running via this command in my docker-entrypoint.sh gunicorn myappname.wsgi:application --worker-class gevent --workers 3 --bind 0.0.0.0:8000
. In the async ones, I run some MQTT Clients (paho.mqtt.client
). This is my custom apps.py
that runs clients asynchronously:
import asyncio
import importlib
import sys
from os import environ
from django.apps import AppConfig, apps
from django.db import connection
from .server import MQTTConsumer # inherited from paho.mqtt.Client
from .consumers import ResponseConsumer
class CustomMqttConfig(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'custom.mqtt'
def ready(self):
for _app in apps.app_configs.keys():
topic_patterns = [
('$share/my-app/response', ResponseConsumer)
]
connection.close()
for _topic, _consumer in topic_patterns:
asyncio.run(MQTTConsumer()(topic=_topic, consumer=_consumer))
Everything works perfectly until 24 hours passed after the first init of my consumers. After 24 hours, I get connection closed
error for each MQTT request from my db (in this case pg-pool) BUT when I open a shell in my async container (docker exec -it <container_id> bash
) and get in the django shell (python manage.py shell
) I can use my models and filter my DB while my consumers throwing that error.
I checked my pg-pool (show pool_processes
) before and after restarting my async pods and saw that everything works perfectly fine after reboot. New connections seen in the pool_processes
and my consumers keep getting and processing the mqtt requests without any error. I keep track of these errors and noticed that it keeps happening every 24 hours.
My async pods receive 300K MQTT messages almost every second. I thought I can check the DB connection for each coming message and re-connect if its necessary but I believe it will cause me some performance problems.
Versions:
...
Django==5.0.2
paho-mqtt=2.0.0
...
Images:
bitnami/pgpool:latest
postgres:15
no extra params are defined in settings.py or anywhere else.
Looking for your comment or advice. Thanks from now.
The problem was about pg-pool-II configuration. Optimizing child_life_time
, max_pool
, num_init_children
with max_connections
(postgresql) solved my problem.