google-cloud-platformgoogle-kubernetes-enginekubernetes-ingressgke-networkingingress-controller

GKE Ingress: missing network endpoints / backends with certain pods


I've been using GKE for a little over a month now.

I successfully deployed the hello-app example in the GCP docs [1] with a Cloud DNS + static external IP and everything worked seamlessly: deployment, pods, services, backends, ingress, URL maps, Target Proxy, Network Endpoint Groups (automatically attached with network endpoints pointing to backends), and external HTTP load balancer were all created just fine. Here's they key thing I'd like to highlight here that works as expected -- the NEG, network endpoints, backends, and ingress all agree with each other and fall into place nicely (although one odd thing is that only 2 of 3 pod readiness gates appear as 1/1 indicating that cloud.google.com/load-balancer-neg-ready status is True):

$ kubectl get po -o wide

NAME                             READY   STATUS    RESTARTS   AGE     IP          NODE                                             NOMINATED NODE   READINESS GATES
hello-app-b5cd5796b-vpgch        1/1     Running   0          3h49m   10.84.0.7   gke-hello-cluster-default-pool-479be6c8-4nbf     <none>           <none>
hello-app-b5cd5796b-wc2sw        1/1     Running   0          3h46m   10.84.2.5   gke-hello-cluster-default-pool-479be6c8-qbj6     <none>           1/1
hello-app-b5cd5796b-wd45s        1/1     Running   0          3h46m   10.84.1.5   gke-hello-cluster-default-pool-479be6c8-9kd2     <none>           1/1


$ kubectl describe ing hello-app

...
Default backend:  hello-app:80 (10.84.0.7:8080,10.84.1.5:8080,10.84.2.5:8080)
Rules:
  Host        Path  Backends
  ----        ----  --------
  *           *     hello-app:80 (10.84.0.7:8080,10.84.1.5:8080,10.84.2.5:8080)
Annotations:  ingress.kubernetes.io/backends: {"k8s1-3c58baf8-default-hello-app-80-f0ce4cea":"Unknown"}
              ingress.kubernetes.io/forwarding-rule: k8s2-fr-s93ji73w-default-hello-app-vz5ktzo8
              ingress.kubernetes.io/target-proxy: k8s2-tp-s93ji73w-default-hello-app-vz5ktzo8
              ingress.kubernetes.io/url-map: k8s2-um-s93ji73w-default-hello-app-vz5ktzo8
              kubernetes.io/ingress.class: gce
              kubernetes.io/ingress.global-static-ip-name: web-static-ip
...
$ gcloud compute backend-services list

NAME                                             BACKENDS                                                                             PROTOCOL
k8s1-3c58baf8-default-hello-app-80-f0ce4cea      us-central1-f/networkEndpointGroups/k8s1-3c58baf8-default-hello-app-80-f0ce4cea      HTTP


$ gcloud compute network-endpoint-groups list 

NAME                                             LOCATION       ENDPOINT_TYPE   SIZE
k8s1-3c58baf8-default-hello-app-80-f0ce4cea      us-central1-f  GCE_VM_IP_PORT  3


$ gcloud compute network-endpoint-groups list-network-endpoints k8s1-3c58baf8-default-hello-app-80-f0ce4cea

INSTANCE                                      IP_ADDRESS  PORT  FQDN
gke-hello-cluster-default-pool-479be6c8-4nbf  10.84.0.7   8080
gke-hello-cluster-default-pool-479be6c8-9kd2  10.84.1.5   8080
gke-hello-cluster-default-pool-479be6c8-qbj6  10.84.2.5   8080
for i in `seq 1 100`; do \
  curl --connect-timeout 1 -s http://{REDACTED}  && echo; \
done  | grep Hostname | sort | uniq -c

  34 Hostname: hello-app-b5cd5796b-vpgch
  41 Hostname: hello-app-b5cd5796b-wc2sw
  25 Hostname: hello-app-b5cd5796b-wd45s

I tried a nearly identical set of steps using the same container image for the deployment but with a Google-managed certificate for SSL [2], and everything worked as expected also.

However, when I attempted this with another application of mine that runs a different container image, just about everything works fine except that all the backends (and their readiness gates) for each pod appear as <none> when I describe the ingress and list the pods. And not a single network endpoint ever gets attached to the NEG created for this ingress:

$ kubectl get po -o wide

NAME                             READY   STATUS    RESTARTS   AGE     IP          NODE                                             NOMINATED NODE   READINESS GATES
frontend-app-5db9697f5b-6cv2l    1/1     Running   0          58m     10.84.0.9   gke-hello-cluster-default-pool-479be6c8-4nbf     <none>           <none>
frontend-app-5db9697f5b-7vtst    1/1     Running   0          57m     10.84.2.7   gke-hello-cluster-default-pool-479be6c8-qbj6     <none>           <none>
frontend-app-5db9697f5b-8pck7    1/1     Running   0          57m     10.84.1.7   gke-hello-cluster-default-pool-479be6c8-9kd2     <none>           <none>


$ kubectl describe ing frontend-app             
...
Default backend:  frontend-svc:80 (<none>)
Rules:
  Host        Path  Backends
  ----        ----  --------
  *           *     frontend-svc:80 (<none>)
            
Annotations:  ingress.gcp.kubernetes.io/pre-shared-cert: mcrt-5eaf911d-8f2c-489a-9cb1-9c80b5bd92c2
              ingress.kubernetes.io/backends: {"k8s1-3c58baf8-default-frontend-svc-80-3abdf46d":"HEALTHY"}
              ingress.kubernetes.io/forwarding-rule: k8s2-fr-s93ji73w-default-frontend-app-65h98k56
              ingress.kubernetes.io/https-forwarding-rule: k8s2-fs-s93ji73w-default-frontend-app-65h98k56
              ingress.kubernetes.io/https-target-proxy: k8s2-ts-s93ji73w-default-frontend-app-65h98k56
              ingress.kubernetes.io/ssl-cert: mcrt-5eaf911d-8f2c-489a-9cb1-9c80b5bd92c2
              ingress.kubernetes.io/target-proxy: k8s2-tp-s93ji73w-default-frontend-app-65h98k56
              ingress.kubernetes.io/url-map: k8s2-um-s93ji73w-default-frontend-app-65h98k56
              kubernetes.io/ingress.class: gce
              kubernetes.io/ingress.global-static-ip-name: frontend-static-ip
              networking.gke.io/managed-certificates: frontend-managed-cert
...

$ gcloud compute network-endpoint-groups list 
NAME                                             LOCATION       ENDPOINT_TYPE   SIZE
k8s1-3c58baf8-default-frontend-svc-80-3abdf46d   us-central1-f  GCE_VM_IP_PORT  0
k8s1-3c58baf8-default-hello-app-80-f0ce4cea      us-central1-f  GCE_VM_IP_PORT  3

Note that the SIZE is 0 and remains that way unless I intervene. So I end up having to do this silly manual step of creating/attaching the network endpoints (captured from an earlier cluster I have since torn down):

enter image description here

This is obviously not the best way to do things, but I know at the very least that there is something about the container image running in my frontend-app pods that somehow prevents the network endpoints from automatically being created and attached to the NEG. I've read through all the troubleshooting docs [3], but still no luck.

Edit: deployment and ingress manifests attached below:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend-app
  namespace: default
spec:
  replicas: 3
  minReadySeconds: 30
  selector:
    matchLabels:
      run: frontend-app
  template:
    metadata:
      labels:
        run: frontend-app
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - image: docker.hub/project/frontend-app:latest
        imagePullPolicy: IfNotPresent
        name: frontend-app
        ports:
        - containerPort: 3000
          protocol: TCP
      nodeSelector:
        cloud.google.com/gke-nodepool: default-pool
apiVersion: v1
kind: Service
metadata:
  name: frontend-svc
  annotations:
    cloud.google.com/neg: '{"ingress": true}'
spec:
  type: ClusterIP
  selector:
    app: frontend-app
  ports:
    - protocol: TCP
      port: 80
      targetPort: 3000
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend-app
  annotations:
    kubernetes.io/ingress.global-static-ip-name: "frontend-static-ip"
    networking.gke.io/managed-certificates: frontend-managed-cert
    kubernetes.io/ingress.class: "gce"
spec:
  defaultBackend:
    service:
      name: frontend-svc
      port:
        number: 80

Any GKE experts who can help enlighten me?

[1] https://cloud.google.com/kubernetes-engine/docs/tutorials/http-balancer

[2] https://cloud.google.com/kubernetes-engine/docs/how-to/managed-certs

[3] https://cloud.google.com/kubernetes-engine/docs/how-to/container-native-load-balancing#troubleshooting


Solution

  • It's your labels. You have run: frontend-app in your deployment and app: frontend-app in your service.