amazon-web-serviceskubernetesamazon-ec2

Missing NVMe SSD in AWS Kubernetes


AWS seems to be hiding my NVMe SSD when an r6gd instance is deployed in Kubernetes, created via the config below.

# eksctl create cluster -f spot04test00.yaml                                                      
apiVersion: eksctl.io/v1alpha5               
kind: ClusterConfig                          
metadata:                                    
  name: tidb-arm-dev #replace with your cluster name
  region: ap-southeast-1 #replace with your preferred AWS region
nodeGroups:                                  
  - name: tiflash-1a                         
    desiredCapacity: 1                       
    availabilityZones: ["ap-southeast-1a"]   
    instancesDistribution:                   
      instanceTypes: ["r6gd.medium"]         
    privateNetworking: true                  
    labels:                                  
      dedicated: tiflash

The running instance has an 80 GiB EBS gp3 block and ZERO NVMe SSD storage as shown in Figure 1.

Figure 1.The 59 GiB NVMe SSD for r6gd instance is swapped out for a 80 GiB gp3 EBS block. What happended to my NVMe SSD?

Why did Amazon swapped out the 59GiB NVMe for a 80 GiB EBS gp3 storage?

where has my NVMe disk gone?

  1. Even if I pre-allocate ephemeral-storage using non-managed nodeGroups, it still showed an 80 GiB EBS storage (Figure 1).

  2. If I use the AWS Web UI to start a new r6gd instance, it clearly shows the attached NVMe SSD (Figure 2)

Figure 2. 59 GiB NVMe for r6gd instance created via AWS Web Console.

After further experimentations, it was found that the 80 GiB EBS volume is attached to r6gd.medium, r6g.medium, r6gd.large, r6g.large instances as a 'ephemeral' resource, regardless of instance size.

eksctl describe nodes:

Capacity:
  attachable-volumes-aws-ebs:  39
  cpu:                         2
  ephemeral-storage:           83864556Ki
  hugepages-2Mi:               0
  memory:                      16307140Ki
  pods:                        29
Allocatable:
  attachable-volumes-aws-ebs:  39
  cpu:                         2
  ephemeral-storage:           77289574682
  hugepages-2Mi:               0
  memory:                      16204740Ki
  pods:                        29

Capacity:
  attachable-volumes-aws-ebs:  39
  cpu:                         2
  ephemeral-storage:           83864556Ki
  hugepages-2Mi:               0
  memory:                      16307140Ki
  pods:                        29
Allocatable:
  attachable-volumes-aws-ebs:  39
  cpu:                         2
  ephemeral-storage:           77289574682
  hugepages-2Mi:               0
  memory:                      16204740Ki
  pods:                        29

Awaiting enlightenment from folks who have successfully utilized NVMe SSD in Kubernetes.


Solution

  • Solved my issue, here are my learnings:

    1. NVMe will not show up in the instance by default (either in AWS web console or within terminal of the VM), but is accessible as /dev/nvme1. Yes you need to format and mount them. For a single VM, that is straightforward, but for k8s, you need to deliberately format them before you can use them.

    2. the 80GB can be overridden with settings on the kubernetes config file

    3. to utilize the VM attached NVMe in k8s, you need to run these 2 additional kubernetes services while setting up the k8s nodes. Remember to modify the yaml files of the 2 servcies to use ARM64 images if you are using ARM64 VM's:

      a. storage-local-static-provisioner

      • ARM64 image: jasonxh/local-volume-provisioner:latest

      b. eks-nvme-ssd-provisioner

      • ARM64 image: zhangguiyu/eks-nvme-ssd-provisioner
    4. The NVMe will never show up as part of the ephemeral storage of your k8s clusters. That ephemeral storage describes the EBS volume you have attached to each VM. I have since restricted mine to 20GB EBS.

    5. The PV will show up when you type kubectl get pvc:

    6. Copies of TiDB node config files below for reference: