I have an app that uses nginx
to serve my Python Flask
app in production that only after a few requests starts locking up and timing out (will serve the first request or two quickly then start timing out and locking up afterwards). The Nginx
app is served via Docker
, the uwsgi Python
app is served on barebones macOS
(this Python app interfaces with the Docker instance running on the OS itself), the routing occurs via Traefik
.
This problem only occurs in production and the only difference there is I'm using Traefik's LetsEncrypt SSL certs to use HTTPS to protect the API. I've narrowed the problem down to the following two docker-compose
config lines (when present the problem persists, when removed the problem is corrected but SSL no longer is enabled):
- "traefik.http.routers.harveyapi.tls=true"
- "traefik.http.routers.harveyapi.tls.certresolver=letsencrypt"
Once locked up, I must restart the uwsgi processes to fix the problem just to have it lock right back up. Restarting nginx (Docker container) doesn't fix the problem which leads me to believe that uwsgi doesn't like the SSL config I'm using? Once I disable SSL support, I can send 2000 requests to the API and have it only take a second or two. Once enabled again, uwsgi can't even respond to 2 requests.
I'd like to be able to support SSL certs to enforce HTTPS connections to this API. I can currently run HTTP with this setup fine (thousands of concurrent connections) but that breaks when trying to use HTTPS.
I host dozens of other PHP sites with near identical setups. The only difference between those projects and this one is that they run PHP in Docker and this runs Python Uwsgi on barebones macOS. Here is the complete dump of configs for this project:
traefik.toml
# Traefik v2 Configuration
# Documentation: https://doc.traefik.io/traefik/migration/v1-to-v2/
[entryPoints]
# http should be redirected to https
[entryPoints.web]
address = ":80"
[entryPoints.web.http.redirections.entryPoint]
to = "websecure"
scheme = "https"
[entryPoints.websecure]
address = ":443"
[entryPoints.websecure.http.tls]
certResolver = "letsencrypt"
# Enable ACME (Let's Encrypt): automatic SSL
[certificatesResolvers.letsencrypt.acme]
email = "email@example.com"
storage = "/etc/traefik/acme/acme.json"
[certificatesResolvers.letsencrypt.acme.httpChallenge]
entryPoint = "web"
[log]
level = "DEBUG"
# Enable Docker Provider
[providers.docker]
endpoint = "unix:///var/run/docker.sock"
exposedByDefault = false # Must pass `traefik.enable=true` label to use Traefik
network = "traefik"
# Enable Ping (used for healthcheck)
[ping]
docker-compose.yml
version: "3.8"
services:
harvey-nginx:
build: .
restart: always
networks:
- traefik
labels:
- traefik.enable=true
labels:
- "traefik.http.routers.harveyapi.rule=Host(`project.com`, `www.project.com`)"
- "traefik.http.routers.harveyapi.tls=true"
- "traefik.http.routers.harveyapi.tls.certresolver=letsencrypt"
networks:
traefik:
name: traefik
uwsgi.ini
[uwsgi]
; uwsgi setup
master = true
memory-report = true
auto-procname = true
strict = true
vacuum = true
die-on-term = true
need-app = true
; concurrency
enable-threads = true
cheaper-initial = 5 ; workers to spawn on startup
cheaper = 2 ; minimum number of workers to go down to
workers = 10 ; highest number of workers to run
; workers
harakiri = 60 ; Restart workers if they have hung on a single request
max-requests = 500 ; Restart workers after this many requests
max-worker-lifetime = 3600 ; Restart workers after this many seconds
reload-on-rss = 1024 ; Restart workers after this much resident memory
reload-mercy = 3 ; How long to wait before forcefully killing workers
worker-reload-mercy = 3 ; How long to wait before forcefully killing workers
; app setup
protocol = http
socket = 127.0.0.1:5000
module = wsgi:APP
; daemonization
; TODO: Name processes `harvey` here
daemonize = /tmp/harvey_daemon.log
nginx.conf
server {
listen 80;
error_log /var/log/nginx/error.log;
access_log /var/log/nginx/access.log;
location / {
include uwsgi_params;
# TODO: Please note this only works for macOS: https://docs.docker.com/desktop/networking/#i-want-to-connect-from-a-container-to-a-service-on-the-host
# and will require adjusting for your OS.
proxy_pass http://host.docker.internal:5000;
}
}
Dockerfile
FROM nginx:1.23-alpine
RUN rm /etc/nginx/conf.d/default.conf
COPY nginx.conf /etc/nginx/conf.d
I've added additional findings on the GitHub issue where I've documented my journey for this problem: https://github.com/Justintime50/harvey/issues/67
This is no longer a problem and the solution is real frustrating - it was Docker's fault. For ~6 months there was a bug in Docker that was dropping connections (ultimately leading to the timeouts mentioned above) which was finally fixed in Docker Desktop 4.14.
The moment I upgraded Docker (it had just come out at the time and I thought I would try the hail Mary upgrade having already turned every dial and adjusted every config param without any luck), it finally stopped timing out and dropping connections. I was suddenly able to send through tens of thousands of concurrent requests without issue.
TLDR: uWSGI, Nginx, nor my config were at fault here. Docker had a bug that has been patched. If others on macOS are facing this problem, try upgrading to at least Docker Dekstop 4.14
.