[SOLVED] GKE Ingress shows unhealthy backend services

GKE Ingress shows unhealthy backend services

I have a GKE cluster with 4 nodes in an instance group. I deployed Ingress and several pods (1 replica only of each pod so they are only on 1 node). I notice on the Google Console (Ingress details page) that all backend services remain Unhealhy although the healthchecks on the running pods are OK and my application is running. To my understanding it says it is unhealthy because out of the 4 nodes, only 1 node is running an instance of a given pod (on the Back-end service details it says "1 of 4 instances healthy"). Am I correct and should I worry and try to fix this? It's bit strange to accept an Unhealthy status when the application is running...

Edit: After further investigation, down to 2 nodes, and activating the healthcheck logs, I can see that the backend service status seems to be the status of the last executed healthcheck. So if it checks last the node that hosts the pod, it is healthy, else it is unhealthy.

GKE version: 1.16.13-gke.1

My ingress definition:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    ingress.gcp.kubernetes.io/pre-shared-cert: mcrt-dc729887-5c67-4388-9327-e4f76baf9eaf
    ingress.kubernetes.io/backends: '{"k8s-be-30301--503461913abc33d7":"UNHEALTHY","k8s-be-31206--503461913abc33d7":"HEALTHY","k8s-be-31253--503461913abc33d7":"HEALTHY","k8s-be-31267--503461913abc33d7":"HEALTHY","k8s-be-31432--503461913abc33d7":"UNHEALTHY","k8s-be-32238--503461913abc33d7":"HEALTHY","k8s-be-32577--503461913abc33d7":"UNHEALTHY","k8s-be-32601--503461913abc33d7":"UNHEALTHY"}'
    ingress.kubernetes.io/https-forwarding-rule: k8s2-fs-sfdowd2x-city-foobar-cloud-8cfrc00p
    ingress.kubernetes.io/https-target-proxy: k8s2-ts-sfdowd2x-city-foobar-cloud-8cfrc00p
    ingress.kubernetes.io/ssl-cert: mcrt-dc729887-5c67-4388-9327-e4f76baf9eaf
    ingress.kubernetes.io/url-map: k8s2-um-sfdowd2x-city-foobar-cloud-8cfrc00p
    kubernetes.io/ingress.allow-http: "false"
    kubernetes.io/ingress.global-static-ip-name: city
    networking.gke.io/managed-certificates: foobar-cloud
  creationTimestamp: "2020-08-06T08:25:18Z"
  finalizers:
  - networking.gke.io/ingress-finalizer-V2
  generation: 1
  labels:
    app.kubernetes.io/instance: foobar-cloud
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: foobar-cloud
    helm.sh/chart: foobar-cloud-0.4.58
  name: foobar-cloud
  namespace: city
  resourceVersion: "37878"
  selfLink: /apis/extensions/v1beta1/namespaces/city/ingresses/foobar-cloud
  uid: 751f78cf-2344-46e3-b87e-04d6d903acd5
spec:
  rules:
  - http:
      paths:
      - backend:
          serviceName: foobar-cloud-server
          servicePort: 9999
        path: /foobar/server
      - backend:
          serviceName: foobar-cloud-server
          servicePort: 9999
        path: /foobar/server/*
status:
  loadBalancer:
    ingress:
    - ip: xx.xx.xx.xx

Solution

I finally found out the cause of this.
My services were not mentionning any value for externalTrafficPolicy so the default value of Cluster applied.
However, I have a NetworkPolicy defined which goal was to prevent traffic from other namespaces,as described here. I added the IPs of the load balancers probes as stated in this doc but was missing the allow connections from other node IPs in the cluster.