azurekubernetesazure-akshuge-pages

Impossible to activate HugePage on AKS nodes


Hi dear Stackoverflow community,

I'm struggling in HugePage activation on a AKS cluster.

  1. I noticed that I first have to configure a nodepool with HugePage support.
  2. Then I know that I have to configure pod also

But in despite of whole things i've done, I could not make it.

If I'm following Microsoft documentation, my nodepool spawn like this:

    "kubeletConfig": {
      "allowedUnsafeSysctls": null,
      "cpuCfsQuota": null,
      "cpuCfsQuotaPeriod": null,
      "cpuManagerPolicy": null,
      "failSwapOn": false,
      "imageGcHighThreshold": null,
      "imageGcLowThreshold": null,
      "topologyManagerPolicy": null
    },
    "linuxOsConfig": {
      "swapFileSizeMb": null,
      "sysctls": {
        "fsAioMaxNr": null,
        "fsFileMax": null,
        "fsInotifyMaxUserWatches": null,
        "fsNrOpen": null,
        "kernelThreadsMax": null,
        "netCoreNetdevMaxBacklog": null,
        "netCoreOptmemMax": null,
        "netCoreRmemMax": null,
        "netCoreSomaxconn": null,
        "netCoreWmemMax": null,
        "netIpv4IpLocalPortRange": "32000 60000",
        "netIpv4NeighDefaultGcThresh1": null,
        "netIpv4NeighDefaultGcThresh2": null,
        "netIpv4NeighDefaultGcThresh3": null,
        "netIpv4TcpFinTimeout": null,
        "netIpv4TcpKeepaliveProbes": null,
        "netIpv4TcpKeepaliveTime": null,
        "netIpv4TcpMaxSynBacklog": null,
        "netIpv4TcpMaxTwBuckets": null,
        "netIpv4TcpRmem": null,
        "netIpv4TcpTwReuse": null,
        "netIpv4TcpWmem": null,
        "netIpv4TcpkeepaliveIntvl": null,
        "netNetfilterNfConntrackBuckets": null,
        "netNetfilterNfConntrackMax": null,
        "vmMaxMapCount": null,
        "vmSwappiness": null,
        "vmVfsCachePressure": null
      },
      "transparentHugePageDefrag": "defer+madvise",
      "transparentHugePageEnabled": "madvise"

But My node is still like that:

# kubectl describe nodes aks-deadpoolhp-31863567-vmss000000|grep hugepage
Capacity:
  attachable-volumes-azure-disk:  16
  cpu:                            8
  ephemeral-storage:              129901008Ki
  hugepages-1Gi:                  0
  hugepages-2Mi:                  0
  memory:                         32940620Ki
  pods:                           110
Allocatable:
  attachable-volumes-azure-disk:  16
  cpu:                            7820m
  ephemeral-storage:              119716768775
  hugepages-1Gi:                  0
  hugepages-2Mi:                  0
  memory:                         28440140Ki
  pods:                           110

My kube version is 1.16.15

I saw also that I should enable featuregate like this --feature-gates=HugePages=true (https://dev.to/dannypsnl/hugepages-on-kubernetes-5e7p) but I don't know how to make that in AKS... anyway As my node is not displaying any HugePage availability, i'm not sure it's useful for now.

I even try to recreate the aks cluster with a --kubeconfig, but everything remain the same: i cannot use HugePage...

Please I need your help again, i'm completely lost into this AKS service...


Solution

  • curl -LO https://github.com/kvaps/kubectl-node-shell/raw/master/kubectl-node_shell
    chmod +x ./kubectl-node_shell
    sudo mv ./kubectl-node_shell /usr/local/bin/kubectl-node_shell
    
    kubectl get pod <YOUR_POD> -o custom-columns=CONTAINER:.spec.nodeName -n <YOUR_NAMESPACE>
    
    kubectl get pod -n <YOUR_NAMESPACE>
    
    kubectl node-shell <NODE>
    
    mkdir -p /mnt/huge
    mount -t hugetlbfs nodev /mnt/huge
    echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
    cat  /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
    
    systemctl restart kubelet
    
    kubectl describe node <NODE>|grep -i -e "capacity" -e "allocatable" -e "huge"