pythondockernginxuwsgitraffic

Uwsgi Locking Up After a Few Requests with Nginx/Traefik/Flask App Running over HTTPS/TLS and Docker


Problem

I have an app that uses nginx to serve my Python Flask app in production that only after a few requests starts locking up and timing out (will serve the first request or two quickly then start timing out and locking up afterwards). The Nginx app is served via Docker, the uwsgi Python app is served on barebones macOS (this Python app interfaces with the Docker instance running on the OS itself), the routing occurs via Traefik.

Findings

This problem only occurs in production and the only difference there is I'm using Traefik's LetsEncrypt SSL certs to use HTTPS to protect the API. I've narrowed the problem down to the following two docker-compose config lines (when present the problem persists, when removed the problem is corrected but SSL no longer is enabled):

      - "traefik.http.routers.harveyapi.tls=true"
      - "traefik.http.routers.harveyapi.tls.certresolver=letsencrypt"

Once locked up, I must restart the uwsgi processes to fix the problem just to have it lock right back up. Restarting nginx (Docker container) doesn't fix the problem which leads me to believe that uwsgi doesn't like the SSL config I'm using? Once I disable SSL support, I can send 2000 requests to the API and have it only take a second or two. Once enabled again, uwsgi can't even respond to 2 requests.

Desired Outcome

I'd like to be able to support SSL certs to enforce HTTPS connections to this API. I can currently run HTTP with this setup fine (thousands of concurrent connections) but that breaks when trying to use HTTPS.

Configs

I host dozens of other PHP sites with near identical setups. The only difference between those projects and this one is that they run PHP in Docker and this runs Python Uwsgi on barebones macOS. Here is the complete dump of configs for this project:

traefik.toml

# Traefik v2 Configuration
# Documentation: https://doc.traefik.io/traefik/migration/v1-to-v2/

[entryPoints]
  # http should be redirected to https
  [entryPoints.web]
    address = ":80"
    [entryPoints.web.http.redirections.entryPoint]
      to = "websecure"
      scheme = "https"
  [entryPoints.websecure]
    address = ":443"
  [entryPoints.websecure.http.tls]
    certResolver = "letsencrypt"

# Enable ACME (Let's Encrypt): automatic SSL
[certificatesResolvers.letsencrypt.acme]
  email = "email@example.com"
  storage = "/etc/traefik/acme/acme.json"
  [certificatesResolvers.letsencrypt.acme.httpChallenge]
    entryPoint = "web"

[log]
  level = "DEBUG"

# Enable Docker Provider
[providers.docker]
  endpoint = "unix:///var/run/docker.sock"
  exposedByDefault = false # Must pass `traefik.enable=true` label to use Traefik
  network = "traefik"

# Enable Ping (used for healthcheck)
[ping]

docker-compose.yml

version: "3.8"
services:
  harvey-nginx:
    build: .
    restart: always
    networks:
      - traefik
    labels:
      - traefik.enable=true
    labels:
      - "traefik.http.routers.harveyapi.rule=Host(`project.com`, `www.project.com`)"
      - "traefik.http.routers.harveyapi.tls=true"
      - "traefik.http.routers.harveyapi.tls.certresolver=letsencrypt"


networks:
  traefik:
    name: traefik

uwsgi.ini

[uwsgi]
; uwsgi setup
master = true
memory-report = true
auto-procname = true
strict = true
vacuum = true
die-on-term = true
need-app = true

; concurrency
enable-threads = true
cheaper-initial = 5   ; workers to spawn on startup
cheaper = 2           ; minimum number of workers to go down to
workers = 10          ; highest number of workers to run

; workers
harakiri = 60               ; Restart workers if they have hung on a single request
max-requests = 500          ; Restart workers after this many requests
max-worker-lifetime = 3600  ; Restart workers after this many seconds
reload-on-rss = 1024        ; Restart workers after this much resident memory
reload-mercy = 3            ; How long to wait before forcefully killing workers
worker-reload-mercy = 3     ; How long to wait before forcefully killing workers

; app setup
protocol = http
socket = 127.0.0.1:5000
module = wsgi:APP

; daemonization
; TODO: Name processes `harvey` here
daemonize = /tmp/harvey_daemon.log

nginx.conf

server {
    listen 80;
    error_log  /var/log/nginx/error.log;
    access_log /var/log/nginx/access.log;

    location / {
        include uwsgi_params;
        # TODO: Please note this only works for macOS: https://docs.docker.com/desktop/networking/#i-want-to-connect-from-a-container-to-a-service-on-the-host
        # and will require adjusting for your OS.
        proxy_pass http://host.docker.internal:5000;
    }
}

Dockerfile

FROM nginx:1.23-alpine

RUN rm /etc/nginx/conf.d/default.conf
COPY nginx.conf /etc/nginx/conf.d

Additional Context

I've added additional findings on the GitHub issue where I've documented my journey for this problem: https://github.com/Justintime50/harvey/issues/67


Solution

  • This is no longer a problem and the solution is real frustrating - it was Docker's fault. For ~6 months there was a bug in Docker that was dropping connections (ultimately leading to the timeouts mentioned above) which was finally fixed in Docker Desktop 4.14.

    The moment I upgraded Docker (it had just come out at the time and I thought I would try the hail Mary upgrade having already turned every dial and adjusted every config param without any luck), it finally stopped timing out and dropping connections. I was suddenly able to send through tens of thousands of concurrent requests without issue.

    TLDR: uWSGI, Nginx, nor my config were at fault here. Docker had a bug that has been patched. If others on macOS are facing this problem, try upgrading to at least Docker Dekstop 4.14.