I have a deployment and service running in GKE using Deployment Manager. Everything about my service works correctly except that the ingress I am creating reports the service in a perpetually unhealthy state.
To be clear, everything about the deployment works except the healthcheck (and as a consequence, the ingress). This was working previously (circa late 2019), and apparently about a year ago GKE added some additional requirements for healthchecks on ingress target services and I have been unable to make sense of them.
I have put an explicit health check on the service, and it reports healthy, but the ingress does not recognize it. The service is using a NodePort but also has containerPort 80 open on the deployment, and it does respond with HTTP 200 to requests on :80 locally, but clearly that is not helping in the deployed service.
The cluster itself is an almost nearly identical copy of the Deployment Manager example
Here is the deployment:
- name: {{ DEPLOYMENT }}
type: {{ CLUSTER_TYPE }}:{{ DEPLOYMENT_COLLECTION }}
metadata:
dependsOn:
- {{ properties['clusterType'] }}
properties:
apiVersion: apps/v1
kind: Deployment
namespace: {{ properties['namespace'] | default('default') }}
metadata:
name: {{ DEPLOYMENT }}
labels:
app: {{ APP }}
tier: resters
spec:
replicas: 1
selector:
matchLabels:
app: {{ APP }}
tier: resters
template:
metadata:
labels:
app: {{ APP }}
tier: resters
spec:
containers:
- name: rester
image: {{ IMAGE }}
resources:
requests:
cpu: 100m
memory: 250Mi
ports:
- containerPort: 80
env:
- name: GCP_PROJECT
value: {{ PROJECT }}
- name: SERVICE_NAME
value: {{ APP }}
- name: MODE
value: rest
- name: REDIS_ADDR
value: {{ properties['memorystoreAddr'] }}
... the service:
- name: {{ SERVICE }}
type: {{ CLUSTER_TYPE }}:{{ SERVICE_COLLECTION }}
metadata:
dependsOn:
- {{ properties['clusterType'] }}
- {{ APP }}-cluster-nodeport-firewall-rule
- {{ DEPLOYMENT }}
properties:
apiVersion: v1
kind: Service
namespace: {{ properties['namespace'] | default('default') }}
metadata:
name: {{ SERVICE }}
labels:
app: {{ APP }}
tier: resters
spec:
type: NodePort
ports:
- nodePort: {{ NODE_PORT }}
port: {{ CONTAINER_PORT }}
targetPort: {{ CONTAINER_PORT }}
protocol: TCP
selector:
app: {{ APP }}
tier: resters
... the explicit healthcheck:
- name: {{ SERVICE }}-healthcheck
type: compute.v1.healthCheck
metadata:
dependsOn:
- {{ SERVICE }}
properties:
name: {{ SERVICE }}-healthcheck
type: HTTP
httpHealthCheck:
port: {{ NODE_PORT }}
requestPath: /healthz
proxyHeader: NONE
checkIntervalSec: 10
healthyThreshold: 2
unhealthyThreshold: 3
timeoutSec: 5
... the firewall rules:
- name: {{ CLUSTER_NAME }}-nodeport-firewall-rule
type: compute.v1.firewall
properties:
name: {{ CLUSTER_NAME }}-nodeport-firewall-rule
network: projects/{{ PROJECT }}/global/networks/default
sourceRanges:
- 130.211.0.0/22
- 35.191.0.0/16
targetTags:
- {{ CLUSTER_NAME }}-node
allowed:
- IPProtocol: TCP
ports:
- 30000-32767
- 80
You could try to define a readinessProbe
on your container in your Deployment.
This is also a metric that the ingress uses to create health checks (note that these health checks probes come from outside of GKE)
And In my experience, these readiness probes work pretty well to get the ingress health checks to work,
To do this, you create something like this, this is a TCP Probe, I have seen better performance with TCP probes.
readinessProbe:
tcpSocket:
port: 80
initialDelaySeconds: 10
periodSeconds: 10
So this probe will check port: 80, which is the one I see is used by the pod in this service, and this will also help configure the ingress health check for a better result.
Here is some helpful documentation on how to create the TCP readiness probes which the ingress health check can be based on.