amazon-web-servicesterraformgithub-actionsamazon-eksterraform-aws-modules

Terraform Destroy on EKS Fails within GitHub. Actions Workflow


Terraform Destroy fails within our workflow:

GitHub Integration Action/Workflow:

name: 'integration'
on:
  push:
    branches:
      - '**'
      - '!main'
  workflow_dispatch:
permissions:
  id-token: write
  contents: read
  deployments: write
jobs:
  integration:
    runs-on: ubuntu-latest
    concurrency:
      group: canary
      cancel-in-progress: false
    defaults:
      run:
        working-directory: examples/complete/
    steps:
      - name: 'Checkout'
        uses: actions/checkout@v3.5.2
      - name: 'Extract branch name'
        id: extract_branch
        shell: bash
        run: echo "##[set-output name=branch;]$(echo ${{ github.ref_name }})"
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          role-to-assume: arn:aws:iam::702906146300:role/terraform-aws-eks-primary
          aws-region: us-west-2
      - name: 'Setup Terraform'
        uses: hashicorp/setup-terraform@v2.0.3
        with:
          terraform_version: 1.3.7
      - name: 'Terraform Init'
        id: init
        run: terraform init -force-copy -input=false
      - name: 'Terraform Validate'
        id: validate
        run: terraform validate -no-color
      - name: 'Terraform Plan'
        id: plan
        run: terraform plan -var="create_cni_ipv6_iam_policy=true" -var="iam_role_attach_cni_policy=true"  -no-color -input=false
      - name: 'Start deployment'
        uses: bobheadxi/deployments@v1.4.0
        id: deployment
        with:
          step: start
          token: ${{ secrets.GITHUB_TOKEN }}
          env: canary
      - name: 'Terraform Apply'
        id: apply
        run: |
          terraform apply -var="create_cni_ipv6_iam_policy=true" -var="iam_role_attach_cni_policy=true"  -no-color -input=false -auto-approve
          terraform apply -no-color -input=false -auto-approve
      - name: 'Terraform Destroy'
        id: destroy
        if: always()
        run: terraform destroy  -no-color -input=false -auto-approve
      - name: 'Finish deployment'
        uses: bobheadxi/deployments@v1.4.0
        if: always()
        with:
          step: finish
          token: ${{ secrets.GITHUB_TOKEN }}
          status: ${{ job.status }}
          env: ${{ steps.deployment.outputs.env }}
          env_url: https://github.com/${{ github.repository }}/actions?query=workflow%3A${{ github.workflow }}+branch%3A${{ steps.extract_branch.outputs.branch }}
          deployment_id: ${{ steps.deployment.outputs.deployment_id }}

The command it fails on is:

terraform destroy  -no-color -input=false -auto-approve

I am specifically using a module to spin up EKS on AWS with terraform.

Module: https://github.com/terraform-aws-modules/terraform-aws-eks

I have tried using multiple versions, but have had very limited success. I don't think it's anything to do with the terraform, but more the module or the terraform command I am using to destroy the infrastructure. As a result, the EKS Cluster does eventually get destroyed, but since I am allowing the module to manage our security groups for the cluster, it seems like ti fails to actually delete the security groups due to a dependency with the way EKS is spun up with the virtual nodes and VPC Access via Cluster and Node Security Groups.

The error:

Error: deleting Security Group (sg-0b11ee4a81d0092b2): DependencyViolation: resource sg-0b11ee4a81d0092b2 has a dependent object
    status code: 400, request id: 8a8dfd26-5198-4bbd-9f0b-84131c248434

main.tf:

################################################
#          KMS CLUSTER ENCRYPTION KEY          #
################################################
resource "aws_kms_key" "this" {
  description             = "EKS Cluster Encyrption Key"
  deletion_window_in_days = 7
  enable_key_rotation     = true
}

resource "aws_kms_alias" "this" {
  name          = "alias/eks_cluster_encryption_key"
  target_key_id = aws_kms_key.this.key_id
}

##################################
#       KUBERNETES CLUSTER       #
##################################
module "primary" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.4.3"

  cluster_name                    = var.cluster_name
  cluster_version                 = var.cluster_version
  cluster_endpoint_private_access = var.cluster_endpoint_private_access
  cluster_endpoint_public_access  = var.cluster_endpoint_public_access

  create_cloudwatch_log_group = false

  create_kms_key = false
  cluster_encryption_config = {
    resources        = ["secrets"]
    provider_key_arn = aws_kms_key.this.arn
  }

  create_cni_ipv6_iam_policy = var.create_cni_ipv6_iam_policy
  manage_aws_auth_configmap  = true
  aws_auth_roles             = var.aws_auth_roles

  vpc_id     = var.vpc_id
  subnet_ids = var.subnet_ids

  eks_managed_node_group_defaults = {
    ami_type       = var.ami_type
    disk_size      = var.disk_size
    instance_types = var.instance_types

    iam_role_attach_cni_policy = var.iam_role_attach_cni_policy
  }

  eks_managed_node_groups = {
    primary = {
      min_size     = 1
      max_size     = 5
      desired_size = 1

      capacity_type = "ON_DEMAND"
    }
    secondary = {
      min_size     = 1
      max_size     = 5
      desired_size = 1

      capacity_type = "SPOT"
    }
  }

  cluster_addons = {
    coredns = {
      most_recent = true

      resolve_conflicts_on_create = "OVERWRITE"
      resolve_conflicts_on_update = "PRESERVE"

      timeouts = {
        create = "20m"
        delete = "20m"
        update = "20m"
      }
    }
    kube-proxy = {
      most_recent                 = true
      resolve_conflicts_on_create = "OVERWRITE"
      resolve_conflicts_on_update = "PRESERVE"

      timeouts = {
        create = "20m"
        delete = "20m"
        update = "20m"
      }
    }
    aws-ebs-csi-driver = {
      most_recent                 = true
      resolve_conflicts_on_create = "OVERWRITE"
      resolve_conflicts_on_update = "PRESERVE"

      timeouts = {
        create = "20m"
        delete = "20m"
        update = "20m"
      }
    }
    vpc-cni = {
      most_recent                 = true
      resolve_conflicts_on_create = "OVERWRITE"
      resolve_conflicts_on_update = "PRESERVE"

      timeouts = {
        create = "20m"
        delete = "20m"
        update = "20m"
      }
    }
  }

  tags = {
    repo  = "https://github.com/impinj-di/terraform-aws-eks-primary"
    team  = "di"
    owner = "di_admins@impinj.com"
  }
}

####################################
#       KUBERNETES RESOURCES       #
####################################
resource "kubernetes_namespace" "this" {
  depends_on = [module.primary]
  for_each   = toset(local.eks_namespaces)
  metadata {
    name = each.key
  }
}

As you can see I am not specifying the Cluster and/or Node-to-Node Security groups, in other words we are wanting defaults.

Should I use the following within my workflow to destroy the infrastructure, or will this make a difference?

terraform destroy -force -no-color -input=false -auto-approve

Solution

  • Most probably there is a ENI attached to that security group, preventing it from deletion with that dependency error.

    Your solution is to figure out the problematic ENI, either from

    and manually delete it, and re-run the destroy command again, but that is only a temp one-time solution for this error.

    And there is no possible simple terraform solution, since you didn't write the terraform code, instead you are using a ready-to-use module, which seems to be prone to the error you are reporting because of some bad dependency management in the module terraform-aws-eks git repo