kubernetesgoogle-kubernetes-engine

Kubernetes startupProbe fails even though app becomes healthy within allowed threshold


I'm running into an issue with my (GKE) Kubernetes deployment's startupProbe. My container exposes a /v1/health endpoint that returns JSON with a "status" field. The probe is configured as follows:

startupProbe:
  exec:
    command:
      - sh
      - -c
      - >
          curl --silent --fail http://localhost:8080/v1/health |
          grep --quiet -e '\"status\":\"healthy\"'
  initialDelaySeconds: 20
  periodSeconds: 10
  timeoutSeconds: 10
  failureThreshold: 18

This should allow up to 3 minutes for the app to become healthy. However, the probe keeps failing and the pod restarts, even though:

The health endpoint returns "status":"undetermined" for a while, then switches to "status":"healthy" (usually within the 3-minute window).

If I manually exec into the pod and run the probe command, it succeeds once the app is up.

 k exec -ti <> -- sh -c 'curl -s http://localhost:8080/v1/health'
{"build_info":{"app_name":"<>","app_version":"<>","build_timestamp":"2025-04-16T18:08:36Z","built_by":"<>","commit_id":"<>"},"status":"healthy","uptime":"13m51.712240388s"}

Both curl and grep are present in the image.

This is the ouput when I describe the pod.

Warning  Unhealthy                          10m (x4 over 11m)  kubelet                  Startup probe failed:

Solution

  • The health-check was eventually passing and the restart attempt stopped. The issue is that Kubernetes does not report when a health check stops failing and starts passing.

    After observing and troubleshooting for a long time I finally realized this when the error counter did not increase after many minutes.