env: Django 4.1.3 drf 3.14.0 psycopg2 2.9.5 postgres 14.7 gunicorn 21.2.0 eventlet 0.33.3
I conducted basic stress testing on the django framework, and my View is as follows
class HealthCheckView(APIView):
def get(self, request, *args, **kwargs):
try:
Role.objects.filter().count()
except Exception as e:
return Response(status=499)
return Response(status=status.HTTP_200_OK)
I use the following command to start the web service:
gunicorn myapp.wsgi:application -c myapp/settings/gunicorn.conf.py
gunicorn.conf.py content is as follows:
bind = f'0.0.0.0:{GUNICORN_PORT}'
worker_class = 'eventlet'
workers = 6
Then I used the ab tool for stress testing:
ab -n 10000 -c 500 https://10.30.7.7/api/v1/healthz/
When the concurrency is 500, the result is good.
When I continued to increase the number of concurrent requests, some of them were unable to connect to the database.
django.db.utils.OperationalError: could not connect to server: Cannot assign requested address Is the server running on host "10.30.7.7" and accepting TCP/IP connections on port 5432?
The database configuration file is as follows:
max_connections = 4096
shared_buffers = 16GB
effective_cache_size = 48GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 1MB
huge_pages = try
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 20
max_parallel_workers_per_gather = 4
max_parallel_workers = 20
max_parallel_maintenance_workers = 4
How should I conduct the investigation?
Reducing the number of concurrent requests can avoid problems, but 500 concurrent requests are obviously not many. At the same time, I have also tried deploying databases using Docker and compiling and installing databases using source code. This problem occurs when the concurrency is high.
Perhaps I have temporarily solved this problem.
I changed the validity time of Django's database connection.
Add CONN_MAX_AGE for Django.
It seems that it was caused by frequent creation and destruction of connections.
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': POSTGRES_DB,
'USER': POSTGRES_USER,
'PASSWORD': POSTGRES_PASSWORD,
'HOST': POSTGRES_HOST,
'PORT': POSTGRES_PORT,
'CONN_MAX_AGE': 600}
}
1.gunicorn + eventlet (10 workers)
A large number of connections to be created and deleted.
The quantity fluctuates between a few hundred to a few thousand.
This method still leads to the occurrence of the above error.
2.gunicorn + gthread (20 workers, 40 threads)
A limited number of database connections have been created.
max_connections = workers * threads = 800
This method works well.