I have created a GKE Autopilot cluster however when I create a stateful set with 3 replicas I am getting the following error
FailedScheduling 77s (x3 over 11m) gke.io/optimize-utilization-scheduler 0/4 nodes are available: 4 Insufficient cpu, 4 Insufficient memory. preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod.
FailedScaleUp 4m32s cluster-autoscaler Node scale up in zones us-central1-b associated with this pod failed: IP space exhausted. Pod is at risk of not being scheduled.
Of the 3 replicas two is fully up and running and the other is giving the following error
kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-statefulset-0 2/2 Running 0 25m
nginx-statefulset-1 2/2 Running 0 24m
nginx-statefulset-2 0/2 Pending 0 10m
kubectl describe pod nginx-statefulset-2
Name: nginx-statefulset-2
Namespace: default
Priority: 0
Service Account: default
Node: <none>
Labels: app=nginx
apps.kubernetes.io/pod-index=2
autopilot.gke.io/allow-net-admin=true
controller-revision-hash=nginx-statefulset-6d59ffdd85
security.istio.io/tlsMode=istio
service.istio.io/canonical-name=nginx
service.istio.io/canonical-revision=latest
statefulset.kubernetes.io/pod-name=nginx-statefulset-2
Annotations: autopilot.gke.io/resource-adjustment:
{"input":{"initContainers":[{"limits":{"cpu":"2","memory":"1Gi"},"requests":{"cpu":"100m","memory":"128Mi"},"name":"istio-init"}],"contain...
autopilot.gke.io/warden-version: 2.9.52
istio.io/rev: default
kubectl.kubernetes.io/default-container: nginx
kubectl.kubernetes.io/default-logs-container: nginx
prometheus.io/path: /stats/prometheus
prometheus.io/port: 15020
prometheus.io/scrape: true
sidecar.istio.io/status:
{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-env...
Status: Pending
SeccompProfile: RuntimeDefault
IP:
IPs: <none>
Controlled By: StatefulSet/nginx-statefulset
Init Containers:
istio-init:
Image: docker.io/istio/proxyv2:1.23.0
Port: <none>
Host Port: <none>
Args:
istio-iptables
-p
15001
-z
15006
-u
1337
-m
REDIRECT
-i
*
-x
-b
*
-d
15090,15021,15020
--log_output_level=default:info
Limits:
cpu: 100m
ephemeral-storage: 2Gi
memory: 128Mi
Requests:
cpu: 100m
ephemeral-storage: 2Gi
memory: 128Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-62ch8 (ro)
Containers:
nginx:
Image: nginx:1.21
Port: 80/TCP
Host Port: 0/TCP
Limits:
cpu: 650m
ephemeral-storage: 1Gi
memory: 2Gi
Requests:
cpu: 650m
ephemeral-storage: 1Gi
memory: 2Gi
Environment: <none>
Mounts:
/usr/share/nginx/html from www (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-62ch8 (ro)
istio-proxy:
Image: docker.io/istio/proxyv2:1.23.0
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--proxyLogLevel=warning
--proxyComponentLogLevel=misc:error
--log_output_level=default:info
Limits:
cpu: 100m
ephemeral-storage: 1Gi
memory: 128Mi
Requests:
cpu: 100m
ephemeral-storage: 1Gi
memory: 128Mi
Readiness: http-get http://:15021/healthz/ready delay=0s timeout=3s period=15s #success=1 #failure=4
Startup: http-get http://:15021/healthz/ready delay=0s timeout=3s period=1s #success=1 #failure=600
Environment:
PILOT_CERT_PROVIDER: istiod
CA_ADDR: istiod.istio-system.svc:15012
POD_NAME: nginx-statefulset-2 (v1:metadata.name)
POD_NAMESPACE: default (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
HOST_IP: (v1:status.hostIP)
ISTIO_CPU_LIMIT: 1 (limits.cpu)
PROXY_CONFIG: {}
ISTIO_META_POD_PORTS: [
{"name":"web","containerPort":80,"protocol":"TCP"}
]
ISTIO_META_APP_CONTAINERS: nginx
GOMEMLIMIT: 134217728 (limits.memory)
GOMAXPROCS: 1 (limits.cpu)
ISTIO_META_CLUSTER_ID: Kubernetes
ISTIO_META_NODE_NAME: (v1:spec.nodeName)
ISTIO_META_INTERCEPTION_MODE: REDIRECT
ISTIO_META_WORKLOAD_NAME: nginx-statefulset
ISTIO_META_OWNER: kubernetes://apis/apps/v1/namespaces/default/statefulsets/nginx-statefulset
ISTIO_META_MESH_ID: cluster.local
TRUST_DOMAIN: cluster.local
Mounts:
/etc/istio/pod from istio-podinfo (rw)
/etc/istio/proxy from istio-envoy (rw)
/var/lib/istio/data from istio-data (rw)
/var/run/secrets/credential-uds from credential-socket (rw)
/var/run/secrets/istio from istiod-ca-cert (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-62ch8 (ro)
/var/run/secrets/tokens from istio-token (rw)
/var/run/secrets/workload-spiffe-credentials from workload-certs (rw)
/var/run/secrets/workload-spiffe-uds from workload-socket (rw)
Conditions:
Type Status
PodScheduled False
Volumes:
workload-socket:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
credential-socket:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
workload-certs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
istio-podinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
metadata.annotations -> annotations
istio-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 43200
istiod-ca-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-ca-root-cert
Optional: false
www:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: www-nginx-statefulset-2
ReadOnly: false
kube-api-access-62ch8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: kubernetes.io/arch=amd64:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal TriggeredScaleUp 10m cluster-autoscaler pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/meghdo-4
567/zones/us-central1-c/instanceGroups/gk3-meghdo-cluster-nap-1qk1o6u2-36641132-grp 0->1 (max: 1000)} {https://www.googleapis.com/compute/v1/projects/meghdo-4567/zones/us-central1-b/instanceGroups/gk3-meghdo-cluster-nap-1qk1o6u2-d7e76a9c-grp 0->1 (max: 1000)}]
Warning FailedScaleUp 9m32s cluster-autoscaler Node scale up in zones us-central1-c, us-central1-b associated with this pod failed: IP space exhausted. Pod is at risk of not being scheduled.
Warning FailedScaleUp 4m32s cluster-autoscaler Node scale up in zones us-central1-b associated with this pod failed: IP space exhausted. Pod is at risk of not being scheduled.
Normal TriggeredScaleUp 3m44s cluster-autoscaler pod triggered scale-up: [{https://www.googleapis.com/compute/v1/projects/meghdo-4
567/zones/us-central1-c/instanceGroups/gk3-meghdo-cluster-nap-su76u4nk-9d90d6ab-grp 0->1 (max: 1000)} {https://www.googleapis.com/compute/v1/projects/meghdo-4567/zones/us-central1-f/instanceGroups/gk3-meghdo-cluster-nap-su76u4nk-f792d666-grp 0->1 (max: 1000)}]
Warning FailedScheduling 77s (x3 over 11m) gke.io/optimize-utilization-scheduler 0/4 nodes are available: 4 Insufficient cpu, 4 Insufficient memory. preemption: 0/4 nodes are available: 4 No preemption victims found for incoming pod.
my nginx stateful set yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nginx-statefulset
namespace: default
labels:
app: nginx
spec:
serviceName: "nginx"
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 1Gi
Checked the quotas in IAM and there no quotas exceeding , in fact it is very much all less than 1% usage
I expect the Autopilot GKE cluster to scale up automatically
You won't find it in quota (it is not about quota). When you create a cluster, you specify a ipv4 range for pods. The error above says that this range has been exhausted.
To find this range, open the cluster in cloud console, and search for Cluster Pod IPv4 range (default)
You can create additional ranges, by creating new secondary ranges to the subnet used by the cluster, and add it to Cluster Pod IPv4 ranges (additional)
and either create new nodepool or enable Node auto-provisioning
. See https://cloud.google.com/kubernetes-engine/docs/how-to/multi-pod-cidr