amazon-web-servicesamazon-eksamazon-ebsaws-ebs-csi-driver

AWS EBS CSI Driver: Could not delete volume ID "vol-XXX": DeleteDisk could not delete volume: UnauthorizedOperation


version of aws ebs csi driver

➜ helm search repo ebs                                
NAME                                    CHART VERSION   APP VERSION DESCRIPTION                                       
aws-ebs-csi-driver/aws-ebs-csi-driver   2.10.1          1.11.2      A Helm chart for AWS EBS CSI Driver 

PV tags

created aws volume has the following tags:

kubernetes.io/cluster/cluster-prod-eks  owned
Name    kubernetes-dynamic-pvc-07ee7aa2-5a7d-4f2b-af34-6fea3cda2fa8
kubernetes.io/created-for/pvc/name  teleport
kubernetes.io/created-for/pv/name   pvc-07ee7aa2-5a7d-4f2b-af34-6fea3cda2fa8
kubernetes.io/created-for/pvc/namespace teleport-cluster

storage class

➜ kg sc gp2 -o yaml                             
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"},"name":"gp2"},"parameters":{"fsType":"ext4","type":"gp2"},"provisioner":"kubernetes.io/aws-ebs","volumeBindingMode":"WaitForFirstConsumer"}
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: "2022-05-31T04:31:24Z"
  name: gp2
  resourceVersion: "271"
  uid: ef10dad0-6a58-4251-a802-d14744bcde43
parameters:
  fsType: ext4
  type: gp2
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

behavior

once the pvc is deleted, the ebs-csi-controller is unable to delete the pv/volume and they all remain "active" inside AWS (not attached to anything) and inside ebs-csi-controller logs I see frequent repeating messages:

➜ klo -l app=ebs-csi-controller -c ebs-plugin
E0902 00:34:52.026908       1 driver.go:120] GRPC error: rpc error: code = Internal desc = Could not delete volume ID "vol-06f9e6132612684e6": DeleteDisk could not delete volume: UnauthorizedOperation: You are not authorized to perform this operation. Encoded authorization failure message: XXXXX
    status code: 403, request id: ca6d72c5-c9f0-4b41-bcac-c92058b05aad

➜ decode_aws XXXXX

{
  "allowed": false,
  "explicitDeny": false,
  "matchedStatements": {
    "items": []
  },
  "failures": {
    "items": []
  },
  "context": {
    "principal": {
      "id": "AROAV5U3TKTQCOL2VXPJD:i-024e6c62a64f813a9",
      "arn": "arn:aws:sts::44444444444:assumed-role/prod-ops-v2-eks-node-group-20220629070052177800000004/i-024e6c62a64f813a9"
    },
    "action": "ec2:DeleteVolume",
    "resource": "arn:aws:ec2:us-east-1:44444444444:volume/vol-0e7e820f437b74069",
    "conditions": {
      "items": [
        {
          "key": "ec2:ResourceTag/kubernetes.io/cluster/cluster-prod-eks",
          "values": {
            "items": [
              {
                "value": "owned"
              }
            ]
          }
        },
        {
          "key": "44444444444:Name",
          "values": {
            "items": [
              {
                "value": "kubernetes-dynamic-pvc-07ee7aa2-5a7d-4f2b-af34-6fea3cda2fa8"
              }
            ]
          }
        },
        {
          "key": "aws:Resource",
          "values": {
            "items": [
              {
                "value": "volume/vol-0e7e820f437b74069"
              }
            ]
          }
        },
        {
          "key": "aws:Account",
          "values": {
            "items": [
              {
                "value": "44444444444"
              }
            ]
          }
        },
        {
          "key": "ec2:AvailabilityZone",
          "values": {
            "items": [
              {
                "value": "us-east-1b"
              }
            ]
          }
        },
        {
          "key": "ec2:Encrypted",
          "values": {
            "items": [
              {
                "value": "false"
              }
            ]
          }
        },
        {
          "key": "ec2:ResourceTag/Name",
          "values": {
            "items": [
              {
                "value": "kubernetes-dynamic-pvc-07ee7aa2-5a7d-4f2b-af34-6fea3cda2fa8"
              }
            ]
          }
        },
        {
          "key": "ec2:VolumeType",
          "values": {
            "items": [
              {
                "value": "gp2"
              }
            ]
          }
        },
        {
          "key": "ec2:ResourceTag/kubernetes.io/created-for/pv/name",
          "values": {
            "items": [
              {
                "value": "pvc-07ee7aa2-5a7d-4f2b-af34-6fea3cda2fa8"
              }
            ]
          }
        },
        {
          "key": "aws:Region",
          "values": {
            "items": [
              {
                "value": "us-east-1"
              }
            ]
          }
        },
        {
          "key": "aws:Service",
          "values": {
            "items": [
              {
                "value": "ec2"
              }
            ]
          }
        },
        {
          "key": "ec2:VolumeID",
          "values": {
            "items": [
              {
                "value": "vol-0e7e820f437b74069"
              }
            ]
          }
        },
        {
          "key": "44444444444:kubernetes.io/created-for/pvc/namespace",
          "values": {
            "items": [
              {
                "value": "teleport-cluster"
              }
            ]
          }
        },
        {
          "key": "ec2:VolumeSize",
          "values": {
            "items": [
              {
                "value": "10"
              }
            ]
          }
        },
        {
          "key": "44444444444:kubernetes.io/created-for/pv/name",
          "values": {
            "items": [
              {
                "value": "pvc-07ee7aa2-5a7d-4f2b-af34-6fea3cda2fa8"
              }
            ]
          }
        },
        {
          "key": "ec2:ResourceTag/kubernetes.io/created-for/pvc/namespace",
          "values": {
            "items": [
              {
                "value": "teleport-cluster"
              }
            ]
          }
        },
        {
          "key": "aws:Type",
          "values": {
            "items": [
              {
                "value": "volume"
              }
            ]
          }
        },
        {
          "key": "ec2:VolumeIOPS",
          "values": {
            "items": [
              {
                "value": "100"
              }
            ]
          }
        },
        {
          "key": "ec2:ResourceTag/kubernetes.io/created-for/pvc/name",
          "values": {
            "items": [
              {
                "value": "teleport"
              }
            ]
          }
        },
        {
          "key": "ec2:Region",
          "values": {
            "items": [
              {
                "value": "us-east-1"
              }
            ]
          }
        },
        {
          "key": "aws:ARN",
          "values": {
            "items": [
              {
                "value": "arn:aws:ec2:us-east-1:44444444444:volume/vol-0e7e820f437b74069"
              }
            ]
          }
        },
        {
          "key": "44444444444:kubernetes.io/created-for/pvc/name",
          "values": {
            "items": [
              {
                "value": "teleport"
              }
            ]
          }
        },
        {
          "key": "44444444444:kubernetes.io/cluster/cluster-prod-eks",
          "values": {
            "items": [
              {
                "value": "owned"
              }
            ]
          }
        }
      ]
    }
  }
}

my instance i-024e6c62a64f813a9 that runs the ebs-csi-controller pod has the managed AWS policy AmazonEBSCSIDriverPolicy attached to the IAM Role arn:aws:sts::44444444444:assumed-role/prod-ops-v2-eks-node-group-20220629070052177800000004/i-024e6c62a64f813a9

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateSnapshot",
                "ec2:AttachVolume",
                "ec2:DetachVolume",
                "ec2:ModifyVolume",
                "ec2:DescribeAvailabilityZones",
                "ec2:DescribeInstances",
                "ec2:DescribeSnapshots",
                "ec2:DescribeTags",
                "ec2:DescribeVolumes",
                "ec2:DescribeVolumesModifications"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags"
            ],
            "Resource": [
                "arn:aws:ec2:*:*:volume/*",
                "arn:aws:ec2:*:*:snapshot/*"
            ],
            "Condition": {
                "StringEquals": {
                    "ec2:CreateAction": [
                        "CreateVolume",
                        "CreateSnapshot"
                    ]
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DeleteTags"
            ],
            "Resource": [
                "arn:aws:ec2:*:*:volume/*",
                "arn:aws:ec2:*:*:snapshot/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateVolume"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "aws:RequestTag/ebs.csi.aws.com/cluster": "true"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateVolume"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "aws:RequestTag/CSIVolumeName": "*"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:CreateVolume"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "aws:RequestTag/kubernetes.io/cluster/*": "owned"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DeleteVolume"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "ec2:ResourceTag/ebs.csi.aws.com/cluster": "true"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DeleteVolume"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "ec2:ResourceTag/CSIVolumeName": "*"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DeleteVolume"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "ec2:ResourceTag/kubernetes.io/cluster/*": "owned"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DeleteSnapshot"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "ec2:ResourceTag/CSIVolumeSnapshotName": "*"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DeleteSnapshot"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "ec2:ResourceTag/ebs.csi.aws.com/cluster": "true"
                }
            }
        }
    ]
}

Expectation

Since the volume is tagged with kubernetes.io/cluster/cluster-prod-eks: owned I expect it to be deleted automatically as the storageclass has reclaimPolicy: Delete but in reality I'm getting error message.

What am I doing wrong?

cloudtrain entry

{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "AROAV5U3TKTQCOL2VXPJD:i-024e6c62a64f813a9",
        "arn": "arn:aws:sts::44444444444:assumed-role/prod-ops-v2-eks-node-group-20220629070052177800000004/i-024e6c62a64f813a9",
        "accountId": "44444444444",
        "accessKeyId": "AAAAAAAAAAAAAAAAAA",
        "sessionContext": {
            "sessionIssuer": {
                "type": "Role",
                "principalId": "AROAV5U3TKTQCOL2VXPJD",
                "arn": "arn:aws:iam::44444444444:role/prod-ops-v2-eks-node-group-20220629070052177800000004",
                "accountId": "44444444444",
                "userName": "prod-ops-v2-eks-node-group-20220629070052177800000004"
            },
            "webIdFederationData": {},
            "attributes": {
                "creationDate": "2022-09-01T19:22:40Z",
                "mfaAuthenticated": "false"
            },
            "ec2RoleDelivery": "2.0"
        }
    },
    "eventTime": "2022-09-02T01:18:36Z",
    "eventSource": "ec2.amazonaws.com",
    "eventName": "DeleteVolume",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "11.11.11.111",
    "userAgent": "aws-sdk-go/1.44.45 (go1.17.13; linux; amd64) exec-env/aws-ebs-csi-driver-v1.11.2",
    "errorCode": "Client.UnauthorizedOperation",
    "errorMessage": "You are not authorized to perform this operation. Encoded authorization failure message: XXXXX",
    "requestParameters": {
        "volumeId": "vol-06f9e6132612684e6"
    },
    "responseElements": null,
    "requestID": "461cafb4-00e6-4f70-a0de-2660517dd1fe",
    "eventID": "04a67446-0bd5-40f0-bfd5-3a8d5938f652",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "44444444444",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.2",
        "cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
        "clientProvidedHostHeader": "ec2.us-east-1.amazonaws.com"
    }
}

Solution

  • solution:

    Was to add 2 tags to the volume resources so the conditions of the managed policy arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy kicked in:

    bash script to automate for many PVCs:

    for vlm in $(aws ec2 describe-volumes --filters Name=tag:kubernetes.io/cluster/xxx-yyy-eks,Values=owned --filters Name=tag:Name,Values="kubernetes-dynamic-pvc*" --query "Volumes[*].VolumeId" --output text); do 
       CSIVolumeName=$(aws ec2 describe-volumes --volume-ids $vlm --query "Volumes[*].Tags[]" | jq -r '.[] | select(.Key=="Name").Value');
       echo $vlm,$CSIVolumeName
       aws ec2 create-tags --resources $vlm --tags Key=CSIVolumeName,Value=$CSIVolumeName
       aws ec2 create-tags --resources $vlm --tags Key=ebs.csi.aws.com/cluster,Value=true
    done
    

    Attach that policy to the IAM role used by the aws instance on which your csi controller is running:

    ➜ node=$(kubectl get nodes \
         $(kubectl get pods \
               -l app=ebs-csi-controller \
               -n kube-system \
               --no-headers \
               -o="custom-columns=NAME:.spec.nodeName") \
         -o="custom-columns=NAME:.spec.providerID" --no-headers | grep -o 'i-.*')
    
    ➜ aws ec2 describe-instances \
        --instance-ids $node --query "Reservations[0].Instances[0].IamInstanceProfile.Id"
    
    AIPA6BMT6GMYXEPTKCHX1
    
    ➜ aws iam list-instance-profiles \
        --query "InstanceProfiles[?InstanceProfileId=='AIPA6BMT6GMYXEPTKCHX1'].Roles[]"
    
    [
        {
            "Path": "/",
            "RoleName": "xxx-yyy-eks20210302004820371500000013",
            "RoleId": "AROA6BMT6GMY3HIGPUGQD",
            "Arn": "arn:aws:iam::xx:role/xxx-yyy-eks20210302004820371500000013",
            "CreateDate": "2021-03-02T00:48:20+00:00",
            "AssumeRolePolicyDocument": {
                "Version": "2012-10-17",
                "Statement": [
                    {
                        "Sid": "EKSWorkerAssumeRole",
                        "Effect": "Allow",
                        "Principal": {
                            "Service": "ec2.amazonaws.com"
                        },
                        "Action": "sts:AssumeRole"
                    }
                ]
            }
        }
    ]