amazon-web-serviceskuberneteseksctl

AWS Enclaves with Amazon EKS runs into time out during node creation


I'm following this tutorial AWS Enclaves with Amazon EKS . Unfortunately, I run into exceeded max wait time for StackCreateComplete waiter error after around 35 minutes and I don't know why...

It seems to stuck when it tries to create the managed nodegroup. Here is the output of the last few lines in the terminal:

2023-03-18 23:24:20 [ℹ]  waiting for CloudFormation stack "eksctl-ne-cluster-nodegroup-managed-ng-1"
2023-03-18 23:25:33 [ℹ]  waiting for CloudFormation stack "eksctl-ne-cluster-nodegroup-managed-ng-1"
2023-03-18 23:27:26 [ℹ]  waiting for CloudFormation stack "eksctl-ne-cluster-nodegroup-managed-ng-1"
2023-03-18 23:28:55 [ℹ]  waiting for CloudFormation stack "eksctl-ne-cluster-nodegroup-managed-ng-1"
2023-03-18 23:30:30 [ℹ]  waiting for CloudFormation stack "eksctl-ne-cluster-nodegroup-managed-ng-1"
2023-03-18 23:32:15 [ℹ]  waiting for CloudFormation stack "eksctl-ne-cluster-nodegroup-managed-ng-1"
2023-03-18 23:33:09 [ℹ]  waiting for CloudFormation stack "eksctl-ne-cluster-nodegroup-managed-ng-1"
2023-03-18 23:34:36 [ℹ]  waiting for CloudFormation stack "eksctl-ne-cluster-nodegroup-managed-ng-1"
2023-03-18 23:36:17 [ℹ]  waiting for CloudFormation stack "eksctl-ne-cluster-nodegroup-managed-ng-1"
2023-03-18 23:37:13 [ℹ]  waiting for CloudFormation stack "eksctl-ne-cluster-nodegroup-managed-ng-1"
2023-03-18 23:38:31 [ℹ]  waiting for CloudFormation stack "eksctl-ne-cluster-nodegroup-managed-ng-1"
2023-03-18 23:38:53 [ℹ]  waiting for CloudFormation stack "eksctl-ne-cluster-nodegroup-managed-ng-1"
2023-03-18 23:38:53 [!]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
2023-03-18 23:38:53 [ℹ]  to cleanup resources, run 'eksctl delete cluster --region=eu-central-1 --name=ne-cluster'
2023-03-18 23:38:53 [✖]  exceeded max wait time for StackCreateComplete waiter
Error: failed to create cluster "ne-cluster"

This is my launch template config:

{
    "ImageId": "ami-0499632f10efc5a62",
    "InstanceType": "m5.xlarge",
    "TagSpecifications": [{
        "ResourceType": "instance",
        "Tags": [{
            "Key":"Name",
            "Value":"webserver"
        }]
    }],
    "UserData":"TUlNRS1WZXJzaW9uOiAxLjAKQ29udGVudC1UeXBlOiBtdWx0aXBhcnQvbWl4ZWQ7IGJvdW5kYXJ5PSI9PU1ZQk9VTkRBUlk9PSIKCi0tPT1NWUJPVU5EQVJZPT0KQ29udGVudC1UeXBlOiB0ZXh0L3gtc2hlbGxzY3JpcHQ7IGNoYXJzZXQ9InVzLWFzY2lpIgoKIyEvYmluL2Jhc2ggLWUKcmVhZG9ubHkgTkVfQUxMT0NBVE9SX1NQRUNfUEFUSD0iL2V0Yy9uaXRyb19lbmNsYXZlcy9hbGxvY2F0b3IueWFtbCIKIyBOb2RlIHJlc291cmNlcyB0aGF0IHdpbGwgYmUgYWxsb2NhdGVkIGZvciBOaXRybyBFbmNsYXZlcwpyZWFkb25seSBDUFVfQ09VTlQ9MgpyZWFkb25seSBNRU1PUllfTUlCPTc2OAoKIyBUaGlzIHN0ZXAgYmVsb3cgaXMgbmVlZGVkIHRvIGluc3RhbGwgbml0cm8tZW5jbGF2ZXMtYWxsb2NhdG9yIHNlcnZpY2UuCmFtYXpvbi1saW51eC1leHRyYXMgaW5zdGFsbCBhd3Mtbml0cm8tZW5jbGF2ZXMtY2xpIC15CgojIFVwZGF0ZSBlbmNsYXZlJ3MgYWxsb2NhdG9yIHNwZWNpZmljYXRpb246IGFsbG9jYXRvci55YW1sCnNlZCAtaSAicy9jcHVfY291bnQ6LiovY3B1X2NvdW50OiAkQ1BVX0NPVU5UL2ciICRORV9BTExPQ0FUT1JfU1BFQ19QQVRICnNlZCAtaSAicy9tZW1vcnlfbWliOi4qL21lbW9yeV9taWI6ICRNRU1PUllfTUlCL2ciICRORV9BTExPQ0FUT1JfU1BFQ19QQVRICiMgUmVzdGFydCB0aGUgbml0cm8tZW5jbGF2ZXMtYWxsb2NhdG9yIHNlcnZpY2UgdG8gdGFrZSBjaGFuZ2VzIGVmZmVjdC4Kc3lzdGVtY3RsIHJlc3RhcnQgbml0cm8tZW5jbGF2ZXMtYWxsb2NhdG9yLnNlcnZpY2UKZWNobyAiTkUgdXNlciBkYXRhIHNjcmlwdCBoYXMgZmluaXNoZWQgc3VjY2Vzc2Z1bGx5LiIKLS09PU1ZQk9VTkRBUlk9PQ=="
 }

and I can run the following command to create the launch template successfully:

aws ec2 create-launch-template \
    --launch-template-name TemplateForEnclaveServer \
    --version-description WebVersion1 \
    --tag-specifications 'ResourceType=launch-template,Tags=[{Key=purpose,Value=production}]' \
    --launch-template-data file://lt_nitro_config.json

In step two the cluster creation fails. The configuration is the following:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: ne-cluster
  region: eu-central-1

managedNodeGroups:
  - name: managed-ng-1
    launchTemplate:
      id: lt-04c4d2c58e20db555
      version: "1" # optional (uses the default launch template version if unspecified)
    minSize: 1
    desiredCapacity: 1

The I run the command:

eksctl create cluster -f cluster_nitro_config.yaml

The cluster creation works fine, but the managed nodegroup fails with the following output:


2023-03-18 23:38:53 [!]  1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
2023-03-18 23:38:53 [ℹ]  to cleanup resources, run 'eksctl delete cluster --region=eu-central-1 --name=ne-cluster'
2023-03-18 23:38:53 [✖]  exceeded max wait time for StackCreateComplete waiter
Error: failed to create cluster "ne-cluster"

The console output is:

Resource handler returned message: "[Issue(Code=NodeCreationFailure, Message=Instances failed to join the kubernetes cluster, ResourceIds=[i-00bf11cb814138f64])] (Service: null, Status Code: 0, Request ID: null)" (RequestToken: 5772ff82-596e-3e57-eb8f-c7ae277f0df2, HandlerErrorCode: GeneralServiceException)

I have no idea what this means. Thanks in advance!


Solution

  • Remove the ImageId from your LT and try again.