We have a bunch of cronjobs in our env. And they run linkerd-proxy as a sidecar.
Well, somewhat often (but not always) the proxy container will fail after the main container is done. We "think" it might be due to open connections, but only cause we read that could cause it. We don't have any real evidence.
But in the end we just don't care why. We don't want the failed linkerd-proxy to cause the job to fail (and fire an alarm). I found docs on podFailurePolicy. But there are only two examples, and no links to more details on the format of the policy.
One of the examples explains I can ignore failures of certain exit codes from a container. But how would I say all exit codes? Bonus points if you know where the docs are for the policy in general, cause I just can't seem to find anything on it.
Edit: looking closer at the podFailurePolicy docs, I think it doesn't even do what I want, it just causes the failure not to count against the backoff, and reruns the job. But I still would love to know the answer to the question anyway. :)
I think combining the Ignore
action with the NotIn
operator could achieve what you want?
podFailurePolicy:
rules:
- action: Ignore
onExitCodes:
containerName: linkerd-proxy
operator: NotIn
values: [0]
Otherwise I would probably advise to simply create a custom liveness probe for the linkerd-proxy
container which always succeeds and therefore Kubernetes will see it as healthy, even when it "fails"?