kubernetesterraformterraform-provider-awsamazon-ekscloudposse

Terraform : "Error: error deleting S3 Bucket" while trying to destroy EKS Cluster


So I created EKS Cluster using example given in
Cloudposse eks terraform module

On top of this, I created AWS S3 and Dynamodb for storing state file and lock file respectively and added the same in terraform backend config.

This is how it looks :

resource "aws_s3_bucket" "terraform_state" {
  bucket = "${var.namespace}-${var.name}-terraform-state"
  # Enable versioning so we can see the full revision history of our
  # state files
  versioning {
    enabled = true
  }
  # Enable server-side encryption by default
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "aws:kms"
      }
    }
  }
}

resource "aws_dynamodb_table" "terraform_locks" {
  name         = "${var.namespace}-${var.name}-running-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"
  attribute {
    name = "LockID"
    type = "S"
  }
}

terraform {
  backend "s3" {
    bucket = "${var.namespace}-${var.name}-terraform-state"
    key    = "${var.stage}/terraform.tfstate"
    region = var.region
    # Replace this with your DynamoDB table name!
    dynamodb_table = "${var.namespace}-${var.name}-running-locks"
    encrypt        = true
  }
}

Now when I try to delete EKS cluster using terraform destroy I get this error:

Error: error deleting S3 Bucket (abc-eks-terraform-state): BucketNotEmpty: The bucket you tried to delete is not empty. You must delete all versions in the bucket.

This is the output of terraform plan -destroy after the cluster is partially destroyed because of s3 error

Changes to Outputs:
  - dynamodb_table_name             = "abc-eks-running-locks" -> null
  - eks_cluster_security_group_name = "abc-staging-eks-cluster" -> null
  - eks_cluster_version             = "1.19" -> null
  - eks_node_group_role_name        = "abc-staging-eks-workers" -> null
  - private_subnet_cidrs            = [
      - "172.16.0.0/19",
      - "172.16.32.0/19",
    ] -> null
  - public_subnet_cidrs             = [
      - "172.16.96.0/19",
      - "172.16.128.0/19",
    ] -> null
  - s3_bucket_arn                   = "arn:aws:s3:::abc-eks-terraform-state" -> null
  - vpc_cidr                        = "172.16.0.0/16" -> null

I cannot manually delete the tfstate in s3 because that'll make terraform recreate everything, also I tried to remove s3 resource from tfstate but it gives me lock error(also tried to forcefully remove lock and with -lock=false)

So I wanted to know is there a way to tell terraform to delete s3 at the end once everything is deleted. Or is there a way to use the terraform which is there in s3 locally?

What's the correct approach to delete EKS cluster when your TF state resides in s3 backend and you have created s3 and dynamodb using same terraform.


Solution

  • Generally, it is not recommended to keep your S3 bucket that you use for Terraform's backend state management in the Terraform state itself (for this exact reason). I've seen this explicitly stated in Terraform documentation, but I've been unable to find it in a quick search.

    What I would do to solve this issue:

    1. Force unlock the Terraform lock (terraform force-unlock LOCK_ID, where LOCK_ID is shown in the error message it gives you when you try to run a command).
    2. Download the state file from S3 (via the AWS console or CLI).
    3. Create a new S3 bucket (manually, not in Terraform).
    4. Manually upload the state file to the new bucket.
    5. Modify your Terraform backend config to use the new bucket.
    6. Empty the old S3 bucket (via the AWS console or CLI).
    7. Re-run Terraform and allow it to delete the old S3 bucket.

    Since it's still using the same old state file (just from a different bucket now), it won't re-create everything, and you'll be able to decouple your TF state bucket/file from other resources.

    If, for whatever reason, Terraform refuses to force-unlock, you can go into the DynamoDB table via the AWS console and delete the lock manually.