Sometimes (no specific correlation) Cloud Run fails to deploy a new revision because of issues with performing a container health check:
X Deploying... Deploying Revision. Waiting on revision <my_app>-s7px8.
- Creating Revision... Internal error occurred while performing container health check.
. Routing traffic...
I use this command to rollout:
gcloud alpha run services replace ...
Some google-specifc details from my cloudrun.yml
:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: <my_app>
spec:
traffic:
- percent: 100
latestRevision: true
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: '1'
autoscaling.knative.dev/maxScale: '40'
run.googleapis.com/vpc-access-connector: <my_app>-vpc-16gbps
run.googleapis.com/sandbox: gvisor
spec:
timeoutSeconds: 50
serviceAccountName: ...
containerConcurrency: 100
containers:
- image: ...
ports:
- containerPort: 8080
name: h2c
resources:
limits:
cpu: '4'
memory: 6Gi
{
"protoPayload": {
"@type": "type.googleapis.com/google.cloud.audit.AuditLog",
"status": {
"code": 13,
"message": "Ready condition status changed to False for Service <my_app> with message: Internal error occurred while performing container health check. Resource readiness deadline exceeded."
},
"serviceName": "run.googleapis.com",
"response": {
"apiVersion": "serving.knative.dev/v1",
"kind": "Service",
"status": {
"observedGeneration": 1,
"conditions": [
{
"type": "Ready",
"status": "False",
"message": "Internal error occurred while performing container health check. Resource readiness deadline exceeded.",
"lastTransitionTime": "2022-10-05T14:11:19.884568Z"
},
{
"type": "ConfigurationsReady",
"status": "Unknown",
"message": "Internal error occurred while performing container health check.",
"lastTransitionTime": "2022-10-05T14:00:50.398740Z"
},
{
"type": "RoutesReady",
"status": "False",
"reason": "RevisionFailed",
"message": "Revision '<my_app>-s7px8' is not ready and cannot serve traffic. Internal error occurred while performing container health check.",
"lastTransitionTime": "2022-10-05T14:11:19.884568Z"
}
],
"latestCreatedRevisionName": "<my_app>-s7px8",
},
"@type": "type.googleapis.com/google.cloud.run.v1.Service"
}
}
}
The command fails within ~15-20 sec. I don't know what probes are executed by GCP (looks like TCP with 4 min timeout), but it looks too early to be exhausted.
I don't have any custom startup probes, health checks, readiness/liveness probes, etc.
Does anybody face the same issue? Any ideas on where to look at?
Ok, that is what worked for me. I removed this line:
autoscaling.knative.dev/minScale: '1'
According to docs it's 0
by default.