google-cloud-platformgoogle-kubernetes-enginegoogle-deployment-manager

GCP GKE Ingress Health Checks


I have a deployment and service running in GKE using Deployment Manager. Everything about my service works correctly except that the ingress I am creating reports the service in a perpetually unhealthy state.

To be clear, everything about the deployment works except the healthcheck (and as a consequence, the ingress). This was working previously (circa late 2019), and apparently about a year ago GKE added some additional requirements for healthchecks on ingress target services and I have been unable to make sense of them.

I have put an explicit health check on the service, and it reports healthy, but the ingress does not recognize it. The service is using a NodePort but also has containerPort 80 open on the deployment, and it does respond with HTTP 200 to requests on :80 locally, but clearly that is not helping in the deployed service.

The cluster itself is an almost nearly identical copy of the Deployment Manager example

Here is the deployment:

- name: {{ DEPLOYMENT }}
  type: {{ CLUSTER_TYPE }}:{{ DEPLOYMENT_COLLECTION }}
  metadata:
    dependsOn:
    - {{ properties['clusterType'] }}
  properties:
    apiVersion: apps/v1
    kind: Deployment
    namespace: {{ properties['namespace'] | default('default') }}
    metadata:
      name: {{ DEPLOYMENT }}
      labels:
        app: {{ APP }}
        tier: resters
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: {{ APP }}
          tier: resters
      template:
        metadata:
          labels:
            app: {{ APP }}
            tier: resters
        spec:
          containers:
          - name: rester
            image: {{ IMAGE }}
            resources:
              requests:
                cpu: 100m
                memory: 250Mi
            ports:
            - containerPort: 80
            env:
            - name: GCP_PROJECT
              value: {{ PROJECT }}
            - name: SERVICE_NAME
              value: {{ APP }}
            - name: MODE
              value: rest
            - name: REDIS_ADDR
              value: {{ properties['memorystoreAddr'] }}

... the service:

- name: {{ SERVICE }}
  type: {{ CLUSTER_TYPE }}:{{ SERVICE_COLLECTION }}
  metadata:
    dependsOn:
    - {{ properties['clusterType'] }}
    - {{ APP }}-cluster-nodeport-firewall-rule
    - {{ DEPLOYMENT }}
  properties:
    apiVersion: v1
    kind: Service
    namespace: {{ properties['namespace'] | default('default') }}
    metadata:
      name: {{ SERVICE }}
      labels:
        app: {{ APP }}
        tier: resters
    spec:
      type: NodePort
      ports:
      - nodePort: {{ NODE_PORT }}
        port: {{ CONTAINER_PORT }}
        targetPort: {{ CONTAINER_PORT }}
        protocol: TCP
      selector:
        app: {{ APP }}
        tier: resters

... the explicit healthcheck:

- name: {{ SERVICE }}-healthcheck
  type: compute.v1.healthCheck
  metadata:
    dependsOn:
    - {{ SERVICE }}
  properties:
    name: {{ SERVICE }}-healthcheck
    type: HTTP
    httpHealthCheck:
      port: {{ NODE_PORT }}
      requestPath: /healthz
      proxyHeader: NONE
    checkIntervalSec: 10
    healthyThreshold: 2
    unhealthyThreshold: 3
    timeoutSec: 5

... the firewall rules:

- name: {{ CLUSTER_NAME }}-nodeport-firewall-rule
  type: compute.v1.firewall
  properties:
    name: {{ CLUSTER_NAME }}-nodeport-firewall-rule
    network: projects/{{ PROJECT }}/global/networks/default
    sourceRanges:
    - 130.211.0.0/22
    - 35.191.0.0/16
    targetTags:
    - {{ CLUSTER_NAME }}-node
    allowed:
    - IPProtocol: TCP
      ports:
      - 30000-32767
      - 80

Solution

  • You could try to define a readinessProbe on your container in your Deployment.

    This is also a metric that the ingress uses to create health checks (note that these health checks probes come from outside of GKE)

    And In my experience, these readiness probes work pretty well to get the ingress health checks to work,

    To do this, you create something like this, this is a TCP Probe, I have seen better performance with TCP probes.

    readinessProbe:
              tcpSocket:
                port: 80
              initialDelaySeconds: 10
              periodSeconds: 10
      
    

    So this probe will check port: 80, which is the one I see is used by the pod in this service, and this will also help configure the ingress health check for a better result.

    Here is some helpful documentation on how to create the TCP readiness probes which the ingress health check can be based on.