kubernetes google-kubernetes-engine kubernetes-ingress ingress-controller ingress-nginx

Ingress-nginx is in CrashLoopBackOff after K8s upgrade

After upgrading Kubernetes node pool from 1.21 to 1.22, ingress-nginx-controller pods started crashing. The same deployment has been working fine in EKS. I'm just having this issue in GKE. Does anyone have any ideas about the root cause?

$ kubectl logs ingress-nginx-controller-5744fc449d-8t2rq -c controller

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.3.1
  Build:         92534fa2ae799b502882c8684db13a25cde68155
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.10

-------------------------------------------------------------------------------

W0219 21:23:08.194770       8 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0219 21:23:08.194995       8 main.go:209] "Creating API client" host="https://10.1.48.1:443"

Ingress pod events:

Events:
  Type     Reason             Age                  From               Message
  ----     ------             ----                 ----               -------
  Normal   Scheduled          27m                  default-scheduler  Successfully assigned infra/ingress-nginx-controller-5744fc449d-8t2rq to gke-infra-nodep-ffe54a41-s7qx
  Normal   Pulling            27m                  kubelet            Pulling image "registry.k8s.io/ingress-nginx/controller:v1.3.1@sha256:54f7fe2c6c5a9db9a0ebf1131797109bb7a4d91f56b9b362bde2abd237dd1974"
  Normal   Started            27m                  kubelet            Started container controller
  Normal   Pulled             27m                  kubelet            Successfully pulled image "registry.k8s.io/ingress-nginx/controller:v1.3.1@sha256:54f7fe2c6c5a9db9a0ebf1131797109bb7a4d91f56b9b362bde2abd237dd1974" in 6.443361484s
  Warning  Unhealthy          26m (x6 over 26m)    kubelet            Readiness probe failed: HTTP probe failed with statuscode: 502
  Normal   Killing            26m                  kubelet            Container controller failed liveness probe, will be restarted
  Normal   Created            26m (x2 over 27m)    kubelet            Created container controller
  Warning  FailedPreStopHook  26m                  kubelet            Exec lifecycle hook ([/wait-shutdown]) for Container "controller" in Pod "ingress-nginx-controller-5744fc449d-8t2rq_infra(c4c166ff-1d86-4385-a22c-227084d569d6)" failed - error: command '/wait-shutdown' exited with 137: , message: ""
  Normal   Pulled             26m                  kubelet            Container image "registry.k8s.io/ingress-nginx/controller:v1.3.1@sha256:54f7fe2c6c5a9db9a0ebf1131797109bb7a4d91f56b9b362bde2abd237dd1974" already present on machine
  Warning  BackOff            7m7s (x52 over 21m)  kubelet            Back-off restarting failed container
  Warning  Unhealthy          2m9s (x55 over 26m)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 502

Solution

The Beta API versions (extensions/v1beta1 and networking.k8s.io/v1beta1) of Ingress are no longer served (removed) for GKE clusters created on versions 1.22 and later. Please refer to the official GKE ingress documentation for changes in the GA API version.

Also refer to Official Kubernetes documentation for API removals for Kubernetes v1.22 for more information.

Before upgrading your Ingress API as a client, make sure that every ingress controller that you use is compatible with the v1 Ingress API. See Ingress Prerequisites for more context about Ingress and ingress controllers.

Also check below possible causes for Crashloopbackoff :

Increasing the initialDelaySeconds value for the livenessProbe setting may help to alleviate the issue, as it will give the container more time to start up and perform its initial work operations before the liveness probe server checks its health.
Check “Container restart policy”, the spec of a Pod has a restartPolicy field with possible values Always, OnFailure, and Never. The default value is Always.
Out of memory or resources : Try to increase the VM size. Containers may crash due to memory limits, then new ones spun up, the health check failed and Ingress served up 502.
Check externalTrafficPolicy=Local is set on the NodePort service will prevent nodes from forwarding traffic to other nodes.

Refer to the Github issue Document how to avoid 502s #34 for more information.