nginxkubernetesopenconnect

kubernetes liveness probe restarts the pod which ends in CrashLoopback


I have a deployment with 2 replicas of nginx with openconnect vpn proxy container (a pod has only one container).

They start without any problems and everything works, but once the connection crashes and my liveness probe fails, the nginx container is restarted ending up in CrashLoopbackoff because the openconnect and nginx restart fails with

nginx:

host not found in upstream "example.server.org" in /etc/nginx/nginx.conf:11

openconnect:

getaddrinfo failed for host 'vpn.server.com': Temporary failure in name resolution

It seems like the /etc/resolv.conf is edited by openconnect and on the pod restart it stays the same (altough it is not a part of a persistent volume) and I believe the whole container should be run from a clean docker image, where the /etc/resolv.conf is not modified, right?

The only way how to fix the CrashLoopback is to delete the pod and the deployment rc runs a new pod that works.

How is it different to create a new pod vs. when the container in pod is restarted by the liveness probe restartPolicy: Always? Is the container restarted with a clean image?


Solution

  • restartPolicy applies to all Containers in the Pod, not the pod itself. Pods usually only get re-created when someone explicitly deletes them.

    I think this explains why the restarted container with the bad resolv.conf fails but a new pod works.

    A "restarted container" is just that, it is not spawned new from the downloaded docker image. It is like killing a process and starting it - the file system for the new process is the same one the old process was updating. But a new pod will create a new container with a local file system view identical to the one packaged in the downloaded docker image - fresh start.