dockerdocker-composedocker-healthcheck

Docker Compose healthcheck: service never becomes unhealthy


I have a compose file with three services (database, backend and frontend). Backend depends on database being healthy, and frontend depends on backend being healthy.

Database (postgres) checks for its own health using pg_isready and backend (FastAPI) checks for its health via an endpoint http://localhost:8080/healthcheck

Compose file:

version: '3'
services:
  
  database:
    image: postgres:14-alpine
    healthcheck:
      test: pg_isready -U postgres
      interval: 1s
      timeout: 5s
      retries: 5
      start_period: 10s

  backend:
    depends_on:
      database:
        condition: service_healthy

    image: backend-api-image
    build: 
      context: backend
      dockerfile: Dockerfile

    ports:
      - "8080:8080"
    volumes:
      - './backend:/backend'

    healthcheck:
      test: wget --no-verbose --tries=1 --spider http://localhost:8080/healthcheck || exit 1
      interval: 1s
      timeout: 5s

  frontend:
    image: my-frontend
    depends_on:
      backend:
        condition: service_healthy
    build:
      context: ./frontend
      dockerfile: Dockerfile

FastAPI app

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.get('/healthcheck')
def get_healthcheck():
    return 'OK'

So far this all works as expected. If, for example I were to have a typo in my healthcheck endpoint route (in my app), startup would fail, like so:

database  | 2023-06-01 23:01:44.410 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
database  | 2023-06-01 23:01:44.410 UTC [1] LOG:  listening on IPv6 address "::", port 5432
database  | 2023-06-01 23:01:44.411 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database  | 2023-06-01 23:01:44.414 UTC [22] LOG:  database system was shut down at 2023-06-01 22:51:10 UTC
database  | 2023-06-01 23:01:44.417 UTC [1] LOG:  database system is ready to accept connections
backend   | INFO:     Will watch for changes in these directories: ['/backend']
backend   | INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend   | INFO:     Started reloader process [1] using StatReload
backend   | INFO:     Started server process [8]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:41294 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:41296 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:41298 - "GET /healthcheck HTTP/1.1" 404 Not Found
dependency failed to start: container backend is unhealthy

Where I'm getting confused is, that after a successful startup, if I change the app in such a way to make backend become unhealthy, the container would detect the change and the check would return a 404 (as expected) but it would never become unhealthy.

database  | 2023-06-01 23:06:37.396 UTC [1] LOG:  listening on IPv4 address "0.0.0.0", port 5432
database  | 2023-06-01 23:06:37.396 UTC [1] LOG:  listening on IPv6 address "::", port 5432
database  | 2023-06-01 23:06:37.397 UTC [1] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
database  | 2023-06-01 23:06:37.400 UTC [22] LOG:  database system was shut down at 2023-06-01 23:06:34 UTC
database  | 2023-06-01 23:06:37.403 UTC [1] LOG:  database system is ready to accept connections
backend   | INFO:     Will watch for changes in these directories: ['/backend']
backend   | INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
backend   | INFO:     Started reloader process [1] using StatReload
backend   | INFO:     Started server process [9]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:49450 - "GET /healthcheck HTTP/1.1" 200 OK
frontend  | 
frontend  | > frontend@0.0.0 dev
frontend  | > vite --host
frontend  | 
frontend  | Forced re-optimization of dependencies
frontend  | 
frontend  |   VITE v4.3.1  ready in 285 ms
frontend  | 
frontend  |   ➜  Local:   http://localhost:5173/
frontend  |   ➜  Network: http://172.26.0.4:5173/
backend   | INFO:     127.0.0.1:57966 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:57968 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:57982 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:57992 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:58002 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:58012 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | INFO:     127.0.0.1:58018 - "GET /healthcheck HTTP/1.1" 200 OK
backend   | WARNING:  StatReload detected changes in 'src/main.py'. Reloading...
backend   | INFO:     Shutting down
backend   | INFO:     Waiting for application shutdown.
backend   | INFO:     Application shutdown complete.
backend   | INFO:     Finished server process [9]
backend   | INFO:     Started server process [76]
backend   | INFO:     Waiting for application startup.
backend   | INFO:     Application startup complete.
backend   | INFO:     127.0.0.1:58028 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:58040 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35092 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35098 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35102 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35116 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35126 - "GET /healthcheck HTTP/1.1" 404 Not Found
backend   | INFO:     127.0.0.1:35134 - "GET /healthcheck HTTP/1.1" 404 Not Found

What I expected:

While running after a successful startup, upon changing the backend code in such a way that its healthcheck would fail, I expected frontend to exit or become degraded somehow, as its health dependency has failed.

What happened:

Everything kept running as if nothing happened, even though the backend healthcheck returned a failing value.

My questions:


Solution

  • In trying to reproduce the behavior you've described, the first problem I ran into is that the standard version of wget will make HEAD requests when using the --spider option, so that your healthcheck results in:

    HEAD /healthcheck HTTP/1.1" 405 Method Not Allowed
    

    This is using wget version 1.21 as installed in the python:3.11 image. I modified the healthcheck to look like this (and dropped the irrelevant parts of your docker-compose.yaml):

    version: '3'
    services:
    
      backend:
        image: backend-api-image
        build:
          context: backend
          dockerfile: Dockerfile
    
        ports:
          - "8080:8080"
        volumes:
          - './backend:/backend'
    
        healthcheck:
          test: wget --no-verbose -O /dev/null --tries=1 http://localhost:8080/healthcheck || exit 1
          interval: 1s
          timeout: 5s
    

    I have your example FastAPI code in backend/backend.py, and my backend/Dockerfile looks like:

    FROM python:3.11
    
    WORKDIR /app
    RUN python3 -m venv .venv
    ENV PATH=/app/.venv/bin:/usr/local/bin:/usr/bin:/bin
    COPY requirements.txt ./
    RUN . .venv/bin/activate && pip install -r requirements.txt
    COPY . ./
    
    CMD ["uvicorn", "--reload", "--host", "0.0.0.0", "--port", "8080", "backend:app"]
    

    When I run docker-compose up, I see:

    backend_1  | INFO:     127.0.0.1:44856 - "GET /healthcheck HTTP/1.1" 200 OK
    backend_1  | INFO:     127.0.0.1:44884 - "GET /healthcheck HTTP/1.1" 200 OK
    

    ...and the container enters the "healthy" state:

    NAME                  IMAGE               COMMAND                  SERVICE             CREATED             STATUS                    PORTS
    webserver_backend_1   backend-api-image   "uvicorn --reload --…"   backend             24 seconds ago      Up 23 seconds (healthy)   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
    

    If I docker exec into the container and modify the FastAPI application to return an error, so that the code looks like this:

    backend_1  | WARNING:  StatReload detected changes in 'backend.py'. Reloading...
    backend_1  | INFO:     Shutting down
    backend_1  | INFO:     Waiting for application shutdown.
    backend_1  | INFO:     Application shutdown complete.
    backend_1  | INFO:     Finished server process [8]
    backend_1  | INFO:     Started server process [1050]
    backend_1  | INFO:     Waiting for application startup.
    backend_1  | INFO:     Application startup complete.
    backend_1  | INFO:     127.0.0.1:44618 - "GET /healthcheck HTTP/1.1" 400 Bad Request
    backend_1  | INFO:     127.0.0.1:48912 - "GET /healthcheck HTTP/1.1" 400 Bad Request
    

    And the container enters the "unhealthy" state:

    NAME                  IMAGE               COMMAND                  SERVICE             CREATED             STATUS                     PORTS
    webserver_backend_1   backend-api-image   "uvicorn --reload --…"   backend             2 minutes ago       Up 2 minutes (unhealthy)   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp
    

    That all seems to work as expected: the container health status changes as the response from the FastAPI service changes.

    Here are some questions to help further diagnose things on your end: