I have an aws managed node group that is acting unexpectedly when I set both desired size and minimum size to 0. I would expect that the managed node group would not have any nodes to start with, but that once I attempt to schedule a pod using a nodeSelector with the label eks.amazonaws.com/nodegroup: my-node-group-name
, the cluster-autoscaler would set the desired size for the managed node group to 1, and a node would be booted.
However, the cluster-autoscaler logs indicate that the pending pod does not trigger a scale up because it wouldn't be schedulable: pod didn't trigger scale-up (it wouldn't fit if a new node is added)
. When I go set desired size to 1 in the managed node group manually however, the pod is scheduled successfully, so I know the nodeSelector works fine.
I thought this might be a labelling issue, as described here: , but I have the labels on my managed node groups set to be auto-discoverable.
spec:
containers:
- command:
- ./cluster-autoscaler
- --cloud-provider=aws
- --namespace=kube-system
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster-name
- --balance-similar-node-groups=true
- --expander=least-waste
- --logtostderr=true
- --skip-nodes-with-local-storage=false
- --skip-nodes-with-system-pods=false
- --stderrthreshold=info
- --v=4
I have set the same labels on the autoscaling group:
Key Value Tag new instances
eks:cluster-name my-cluster-name Yes
eks:nodegroup-name my-node-group-name Yes
k8s.io/cluster-autoscaler/enabled true Yes
k8s.io/cluster-autoscaler/my-cluster-name owned Yes
kubernetes.io/cluster/my-cluster-name owned Yes
Am I missing something? Or is this expected behavior for setting desired size to 0?
Ugh, it turns out this is just an aws incompatibility with the cluster-autoscaler that they don't tell you about. You can scale your managed node group down to zero, but without a workaround, you can't scale it back up.
For the cluster-autoscaler to scale up a node group from 0, it constructs a pseudo node based on the nodegroup specifications, in this case the aws autoscaling group. For the cluster-autoscaler to know what labels to put on that pseudo node to check if it would allow a pod to be scheduled, you need to add a specific tag to the nodegroup.
Sadly, aws does not add this tag to the autoscaling group for you, and also does not propagate tags from the managed node group to the autoscaling group. The only way to make this work is to go add the tag to the autoscaling group yourself after it was created by the managed node group. The issue is tracked here.