I'm running into an issue with my (GKE) Kubernetes deployment's startupProbe. My container exposes a /v1/health endpoint that returns JSON with a "status" field. The probe is configured as follows:
startupProbe:
exec:
command:
- sh
- -c
- >
curl --silent --fail http://localhost:8080/v1/health |
grep --quiet -e '\"status\":\"healthy\"'
initialDelaySeconds: 20
periodSeconds: 10
timeoutSeconds: 10
failureThreshold: 18
This should allow up to 3 minutes for the app to become healthy. However, the probe keeps failing and the pod restarts, even though:
The health endpoint returns "status":"undetermined" for a while, then switches to "status":"healthy" (usually within the 3-minute window).
If I manually exec into the pod and run the probe command, it succeeds once the app is up.
k exec -ti <> -- sh -c 'curl -s http://localhost:8080/v1/health'
{"build_info":{"app_name":"<>","app_version":"<>","build_timestamp":"2025-04-16T18:08:36Z","built_by":"<>","commit_id":"<>"},"status":"healthy","uptime":"13m51.712240388s"}
Both curl and grep are present in the image.
This is the ouput when I describe the pod.
Warning Unhealthy 10m (x4 over 11m) kubelet Startup probe failed:
The health-check was eventually passing and the restart attempt stopped. The issue is that Kubernetes does not report when a health check stops failing and starts passing.
After observing and troubleshooting for a long time I finally realized this when the error counter did not increase after many minutes.