I've got a Next.js app which has 2 simple readiness
and liveness
endpoints with the following implementation:
return res.status(200).send('OK');
I've created the endpoints as per the api routes docs. Also, I've got a /stats
basePath as per the docs here. So, the probes endpoints are at /stats/api/readiness
and /stats/api/liveness
.
When I build and run the app in a Docker container locally - the probe endpoints are accessible and returning 200 OK.
When I deploy the app to my k8s cluster, though, the probes fail. There's plenty of initialDelaySeconds
time, so that's not the cause.
I connect to the service
of the pod thru port-forward
and when the pod has just started, before it fails, I can hit the endpoint and it returns 200 OK. And a bit after it starts failing as usual.
I also tried accessing the failing pod thru a healthy pod:
k exec -t [healthy pod name] -- curl -l 10.133.2.35:8080/stats/api/readiness
And the same situation - in the beginning, while the pod hasn't failed yet, I get 200 OK on the curl command. And a bit after, it start failing.
The error on the probes that I get is:
Readiness probe failed: Get http://10.133.2.35:8080/stats/api/readiness: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Funny experiment - I tried putting a random, non-existent endpoint for the probes, and I get the same error. Which leads me to the thought that the probes fail because it cannot access the proper endpoints?
But then again, the endpoints are accessible for a period of time before the probes start failing. So, I have literally no idea why this is happening.
Here is my k8s deployment config for the probes:
livenessProbe:
httpGet:
path: /stats/api/liveness
port: 8080
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 3
periodSeconds: 3
successThreshold: 1
failureThreshold: 5
readinessProbe:
httpGet:
path: /stats/api/readiness
port: 8080
scheme: HTTP
initialDelaySeconds: 10
timeoutSeconds: 3
periodSeconds: 3
successThreshold: 1
failureThreshold: 3
Update
used curl -v
as requested from comments. The result is:
* Trying 10.133.0.12:8080...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Connected to 10.133.0.12 (10.133.0.12) port 8080 (#0)
> GET /stats/api/healthz HTTP/1.1
> Host: 10.133.0.12:8080
> User-Agent: curl/7.76.1
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< ETag: "2-nOO9QiTIwXgNtWtBJezz8kv3SLc"
< Content-Length: 2
< Date: Wed, 16 Jun 2021 18:42:23 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
<
{ [2 bytes data]
100 2 100 2 0 0 666 0 --:--:-- --:--:-- --:--:-- 666
* Connection #0 to host 10.133.0.12 left intact
OK%
Then, ofcourse, once it starts failing, the result is:
* Trying 10.133.0.12:8080...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* connect to 10.133.0.12 port 8080 failed: Connection refused
* Failed to connect to 10.133.0.12 port 8080: Connection refused
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
* Closing connection 0
curl: (7) Failed to connect to 10.133.0.12 port 8080: Connection refused
command terminated with exit code 7
Error tells you: Client.Timeout exceeded while awaiting headers
. Meaning the TCP connection is established (not refused, nor timing out).
Your liveness/readiness probe timeout is too low. Your application doesn't have enough time to respond.
Could be due to CPU or memory allocations being smaller than when using your laptop, due to higher concurrency, maybe a LimitRange that sets some defaults when you did not.
Check with:
time kubectl exec -t [healthy pod name] -- curl -l 127.0.0.1:8080/stats/api/readiness
If you can't allocate more CPU, double that time, round it up, and fix your probes:
livenessProbe:
...
timeoutSeconds: 10
readinessProbe:
...
timeoutSeconds: 10
Alternatively, though probably less in the spirit, you could replace those httpGet checks with tcpSocket ones. They would be faster, though may miss actual issues.