AWS seems to be hiding my NVMe SSD when an r6gd instance is deployed in Kubernetes, created via the config below.
# eksctl create cluster -f spot04test00.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: tidb-arm-dev #replace with your cluster name
region: ap-southeast-1 #replace with your preferred AWS region
nodeGroups:
- name: tiflash-1a
desiredCapacity: 1
availabilityZones: ["ap-southeast-1a"]
instancesDistribution:
instanceTypes: ["r6gd.medium"]
privateNetworking: true
labels:
dedicated: tiflash
The running instance has an 80 GiB EBS gp3 block and ZERO NVMe SSD storage as shown in Figure 1.
Why did Amazon swapped out the 59GiB NVMe for a 80 GiB EBS gp3 storage?
where has my NVMe disk gone?
Even if I pre-allocate ephemeral-storage using non-managed nodeGroups, it still showed an 80 GiB EBS storage (Figure 1).
If I use the AWS Web UI to start a new r6gd instance, it clearly shows the attached NVMe SSD (Figure 2)
After further experimentations, it was found that the 80 GiB EBS volume is attached to r6gd.medium, r6g.medium, r6gd.large, r6g.large instances as a 'ephemeral' resource, regardless of instance size.
eksctl describe nodes:
Capacity:
attachable-volumes-aws-ebs: 39
cpu: 2
ephemeral-storage: 83864556Ki
hugepages-2Mi: 0
memory: 16307140Ki
pods: 29
Allocatable:
attachable-volumes-aws-ebs: 39
cpu: 2
ephemeral-storage: 77289574682
hugepages-2Mi: 0
memory: 16204740Ki
pods: 29
Capacity:
attachable-volumes-aws-ebs: 39
cpu: 2
ephemeral-storage: 83864556Ki
hugepages-2Mi: 0
memory: 16307140Ki
pods: 29
Allocatable:
attachable-volumes-aws-ebs: 39
cpu: 2
ephemeral-storage: 77289574682
hugepages-2Mi: 0
memory: 16204740Ki
pods: 29
Awaiting enlightenment from folks who have successfully utilized NVMe SSD in Kubernetes.
Solved my issue, here are my learnings:
NVMe will not show up in the instance by default (either in AWS web console or within terminal of the VM), but is accessible as /dev/nvme1. Yes you need to format and mount them. For a single VM, that is straightforward, but for k8s, you need to deliberately format them before you can use them.
the 80GB can be overridden with settings on the kubernetes config file
to utilize the VM attached NVMe in k8s, you need to run these 2 additional kubernetes services while setting up the k8s nodes. Remember to modify the yaml files of the 2 servcies to use ARM64 images if you are using ARM64 VM's:
a. storage-local-static-provisioner
The NVMe will never show up as part of the ephemeral storage of your k8s clusters. That ephemeral storage describes the EBS volume you have attached to each VM. I have since restricted mine to 20GB EBS.
The PV will show up when you type kubectl get pvc:
Copies of TiDB node config files below for reference:
kubectl get pvc
guiyu@mi:~/dst/bin$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
local-pv-1a3321d4 107Gi RWO Retain Bound tidb-cluster-dev/tikv-tidb-arm-dev-tikv-2 local-storage 9d
local-pv-82e9e739 107Gi RWO Retain Bound tidb-cluster-dev/pd-tidb-arm-dev-pd-1 local-storage 9d
local-pv-b9556b9b 107Gi RWO Retain Bound tidb-cluster-dev/data0-tidb-arm-dev-tiflash-2 local-storage 6d8h
local-pv-ce6f61f2 107Gi RWO Retain Bound tidb-cluster-dev/pd-tidb-arm-dev-pd-2 local-storage 9d
local-pv-da670e42 107Gi RWO Retain Bound tidb-cluster-dev/tikv-tidb-arm-dev-tikv-3 local-storage 6d8h
local-pv-f09b19f4 107Gi RWO Retain Bound tidb-cluster-dev/pd-tidb-arm-dev-pd-0 local-storage 9d
local-pv-f337849f 107Gi RWO Retain Bound tidb-cluster-dev/data0-tidb-arm-dev-tiflash-0 local-storage 9d
local-pv-ff2f11c6 107Gi RWO Retain Bound tidb-cluster-dev/tikv-tidb-arm-dev-tikv-0 local-storage 9d
pods.yaml
tiflash:
baseImage: pingcap/tiflash-arm64
maxFailoverCount: 3
replicas: 2
nodeSelector:
dedicated: tiflash
tolerations:
- effect: NoSchedule
key: dedicated
operator: Equal
value: tiflash
storageClaims:
- resources:
requests:
storage: "100Gi"
storageClassName: local-storage
eks-setup.yaml
- name: tiflash-1a
desiredCapacity: 1
instanceTypes: ["r6gd.large"]
privateNetworking: true
availabilityZones: ["ap-southeast-1a"]
spot: false
volumeSize: 20 # GiB EBS gp3 3000 IOPS
volumeType: gp3
ssh:
allow: true
publicKeyPath: '~/dst/etc/data-platform-dev.pub'
labels:
dedicated: tiflash