amazon-web-serviceskubernetesamazon-eksaws-sts

Cluster auto scaler pod crashing timeout sts.us-west-1.amazonaws.com


I am following this document to deploy cluster auto scaler in EKS https://docs.aws.amazon.com/eks/latest/userguide/autoscaling.html

EKS Version is 1.24. Cluster. Public traffic is allowed on the open internet and we have whitelisted the .amazonaws.com domain in the squid proxy.

I feel there might be something wrong with the role or policy configuration

Error in pod:

F0208 05:39:52.442470 1 aws_cloud_provider.go:386] Failed to generate AWS EC2 Instance Types: WebIdentityErr: failed to retrieve credentials caused by: RequestError: send request failed caused by: Post "https://sts.us-west-1.amazonaws.com/": dial tcp 176.32.112.54:443: i/o timeout

The service account has the annotation in place to make use of the IAM role

Kubectl describes cluster-autoscaler service account

Name:                cluster-autoscaler
Namespace:           kube-system
Labels:              k8s-addon=cluster-autoscaler.addons.k8s.io
                     k8s-app=cluster-autoscaler
Annotations:         eks.amazonaws.com/role-arn: arn:aws:iam::<ID>:role/irsa-clusterautoscaler
Image pull secrets:  <none>
Mountable secrets:   <none>
Tokens:              <none>
Events:              <none>

Solution

  • It was solved by adding the proxy details on the container env of the deployment. Which is missing in the actual documentation, they could add it as a hint. Pod was not taking the proxy setting available in the node, it was expecting it to be configured.