Terraform Destroy fails within our workflow:
GitHub Integration Action/Workflow:
name: 'integration'
on:
push:
branches:
- '**'
- '!main'
workflow_dispatch:
permissions:
id-token: write
contents: read
deployments: write
jobs:
integration:
runs-on: ubuntu-latest
concurrency:
group: canary
cancel-in-progress: false
defaults:
run:
working-directory: examples/complete/
steps:
- name: 'Checkout'
uses: actions/checkout@v3.5.2
- name: 'Extract branch name'
id: extract_branch
shell: bash
run: echo "##[set-output name=branch;]$(echo ${{ github.ref_name }})"
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: arn:aws:iam::702906146300:role/terraform-aws-eks-primary
aws-region: us-west-2
- name: 'Setup Terraform'
uses: hashicorp/setup-terraform@v2.0.3
with:
terraform_version: 1.3.7
- name: 'Terraform Init'
id: init
run: terraform init -force-copy -input=false
- name: 'Terraform Validate'
id: validate
run: terraform validate -no-color
- name: 'Terraform Plan'
id: plan
run: terraform plan -var="create_cni_ipv6_iam_policy=true" -var="iam_role_attach_cni_policy=true" -no-color -input=false
- name: 'Start deployment'
uses: bobheadxi/deployments@v1.4.0
id: deployment
with:
step: start
token: ${{ secrets.GITHUB_TOKEN }}
env: canary
- name: 'Terraform Apply'
id: apply
run: |
terraform apply -var="create_cni_ipv6_iam_policy=true" -var="iam_role_attach_cni_policy=true" -no-color -input=false -auto-approve
terraform apply -no-color -input=false -auto-approve
- name: 'Terraform Destroy'
id: destroy
if: always()
run: terraform destroy -no-color -input=false -auto-approve
- name: 'Finish deployment'
uses: bobheadxi/deployments@v1.4.0
if: always()
with:
step: finish
token: ${{ secrets.GITHUB_TOKEN }}
status: ${{ job.status }}
env: ${{ steps.deployment.outputs.env }}
env_url: https://github.com/${{ github.repository }}/actions?query=workflow%3A${{ github.workflow }}+branch%3A${{ steps.extract_branch.outputs.branch }}
deployment_id: ${{ steps.deployment.outputs.deployment_id }}
The command it fails on is:
terraform destroy -no-color -input=false -auto-approve
I am specifically using a module to spin up EKS on AWS with terraform.
Module: https://github.com/terraform-aws-modules/terraform-aws-eks
I have tried using multiple versions, but have had very limited success. I don't think it's anything to do with the terraform, but more the module or the terraform command I am using to destroy the infrastructure. As a result, the EKS Cluster does eventually get destroyed, but since I am allowing the module to manage our security groups for the cluster, it seems like ti fails to actually delete the security groups due to a dependency with the way EKS is spun up with the virtual nodes and VPC Access via Cluster and Node Security Groups.
The error:
Error: deleting Security Group (sg-0b11ee4a81d0092b2): DependencyViolation: resource sg-0b11ee4a81d0092b2 has a dependent object
status code: 400, request id: 8a8dfd26-5198-4bbd-9f0b-84131c248434
main.tf:
################################################
# KMS CLUSTER ENCRYPTION KEY #
################################################
resource "aws_kms_key" "this" {
description = "EKS Cluster Encyrption Key"
deletion_window_in_days = 7
enable_key_rotation = true
}
resource "aws_kms_alias" "this" {
name = "alias/eks_cluster_encryption_key"
target_key_id = aws_kms_key.this.key_id
}
##################################
# KUBERNETES CLUSTER #
##################################
module "primary" {
source = "terraform-aws-modules/eks/aws"
version = "~> 19.4.3"
cluster_name = var.cluster_name
cluster_version = var.cluster_version
cluster_endpoint_private_access = var.cluster_endpoint_private_access
cluster_endpoint_public_access = var.cluster_endpoint_public_access
create_cloudwatch_log_group = false
create_kms_key = false
cluster_encryption_config = {
resources = ["secrets"]
provider_key_arn = aws_kms_key.this.arn
}
create_cni_ipv6_iam_policy = var.create_cni_ipv6_iam_policy
manage_aws_auth_configmap = true
aws_auth_roles = var.aws_auth_roles
vpc_id = var.vpc_id
subnet_ids = var.subnet_ids
eks_managed_node_group_defaults = {
ami_type = var.ami_type
disk_size = var.disk_size
instance_types = var.instance_types
iam_role_attach_cni_policy = var.iam_role_attach_cni_policy
}
eks_managed_node_groups = {
primary = {
min_size = 1
max_size = 5
desired_size = 1
capacity_type = "ON_DEMAND"
}
secondary = {
min_size = 1
max_size = 5
desired_size = 1
capacity_type = "SPOT"
}
}
cluster_addons = {
coredns = {
most_recent = true
resolve_conflicts_on_create = "OVERWRITE"
resolve_conflicts_on_update = "PRESERVE"
timeouts = {
create = "20m"
delete = "20m"
update = "20m"
}
}
kube-proxy = {
most_recent = true
resolve_conflicts_on_create = "OVERWRITE"
resolve_conflicts_on_update = "PRESERVE"
timeouts = {
create = "20m"
delete = "20m"
update = "20m"
}
}
aws-ebs-csi-driver = {
most_recent = true
resolve_conflicts_on_create = "OVERWRITE"
resolve_conflicts_on_update = "PRESERVE"
timeouts = {
create = "20m"
delete = "20m"
update = "20m"
}
}
vpc-cni = {
most_recent = true
resolve_conflicts_on_create = "OVERWRITE"
resolve_conflicts_on_update = "PRESERVE"
timeouts = {
create = "20m"
delete = "20m"
update = "20m"
}
}
}
tags = {
repo = "https://github.com/impinj-di/terraform-aws-eks-primary"
team = "di"
owner = "di_admins@impinj.com"
}
}
####################################
# KUBERNETES RESOURCES #
####################################
resource "kubernetes_namespace" "this" {
depends_on = [module.primary]
for_each = toset(local.eks_namespaces)
metadata {
name = each.key
}
}
As you can see I am not specifying the Cluster and/or Node-to-Node Security groups, in other words we are wanting defaults.
Should I use the following within my workflow to destroy the infrastructure, or will this make a difference?
terraform destroy -force -no-color -input=false -auto-approve
Most probably there is a ENI attached to that security group, preventing it from deletion with that dependency error.
Your solution is to figure out the problematic ENI, either from
aws ec2 describe-network-interfaces --filter Name=group-id,Values=sg-0b11ee4a81d0092b2 --query NetworkInterfaces[*].NetworkInterfaceId
and manually delete it, and re-run the destroy command again, but that is only a temp one-time solution for this error.
And there is no possible simple terraform solution, since you didn't write the terraform code, instead you are using a ready-to-use module, which seems to be prone to the error you are reporting because of some bad dependency management in the module terraform-aws-eks git repo