djangopostgresqldjango-rest-frameworkstress-testingpython-3.11

Web stress testing, the pg encountered an exception, TCP/IP connections on port 5432?


env: Django 4.1.3 drf 3.14.0 psycopg2 2.9.5 postgres 14.7 gunicorn 21.2.0 eventlet 0.33.3

I conducted basic stress testing on the django framework, and my View is as follows

class HealthCheckView(APIView):
    def get(self, request, *args, **kwargs):
        try:
            Role.objects.filter().count()
        except Exception as e:
            return Response(status=499)
        return Response(status=status.HTTP_200_OK)

I use the following command to start the web service:

gunicorn myapp.wsgi:application -c myapp/settings/gunicorn.conf.py

gunicorn.conf.py content is as follows:

bind = f'0.0.0.0:{GUNICORN_PORT}'
worker_class = 'eventlet'
workers = 6

Then I used the ab tool for stress testing:

ab -n 10000 -c 500 https://10.30.7.7/api/v1/healthz/

When the concurrency is 500, the result is good.

When I continued to increase the number of concurrent requests, some of them were unable to connect to the database.

django.db.utils.OperationalError: could not connect to server: Cannot assign requested address
        Is the server running on host "10.30.7.7" and accepting
        TCP/IP connections on port 5432?

The database configuration file is as follows:

max_connections = 4096
shared_buffers = 16GB
effective_cache_size = 48GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 1MB
huge_pages = try
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 20
max_parallel_workers_per_gather = 4
max_parallel_workers = 20
max_parallel_maintenance_workers = 4

How should I conduct the investigation?

Reducing the number of concurrent requests can avoid problems, but 500 concurrent requests are obviously not many. At the same time, I have also tried deploying databases using Docker and compiling and installing databases using source code. This problem occurs when the concurrency is high.


Solution

  • Perhaps I have temporarily solved this problem.

    I changed the validity time of Django's database connection.

    Add CONN_MAX_AGE for Django.

    It seems that it was caused by frequent creation and destruction of connections.

    DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': POSTGRES_DB,
        'USER': POSTGRES_USER,
        'PASSWORD': POSTGRES_PASSWORD,
        'HOST': POSTGRES_HOST,
        'PORT': POSTGRES_PORT,
        'CONN_MAX_AGE': 600}
    }
    

    1.gunicorn + eventlet (10 workers)

    A large number of connections to be created and deleted.

    The quantity fluctuates between a few hundred to a few thousand.

    This method still leads to the occurrence of the above error.

    2.gunicorn + gthread (20 workers, 40 threads)

    A limited number of database connections have been created.

    max_connections = workers * threads = 800

    This method works well.