kubernetesazure-akskubernetes-podspot-instances

NodeSelector does not work for multiple node pools?


TL;DR: NodeSelector ignores nodes from another NodePool. How to distribute pods across more NodePools using a label nodeSelector or other technique?

I have two nodePools like this:

...
# Spot node pool
resource "azurerm_kubernetes_cluster_node_pool" "aks_staging_np_compute_spot" {
  name                  = "computespot"
  (...)
  vm_size               = "Standard_F8s_v2"
  max_count             = 2
  min_count             = 2
  (...)
  priority = "Spot"
  eviction_policy = "Delete"
  (...)
  node_labels = {
    "pool_type" = "compute"
  }

# Regular node pool
resource "azurerm_kubernetes_cluster_node_pool" "aks_staging_np_compute_base" {
  name                  = "computebase"
  (...)
  vm_size               = "Standard_F8s_v2"
  max_count             = 2
  min_count             = 2
  node_labels = {
    "pool_type" = "compute"
  }

Both pools are deployed in AKS and all the nodes are present in OK state. Please note two things:

(There are also 20 other nodes with different labels in my cluster which are not important.)

Then I've got a deployment like this (omitted irrelevant lines for brevity):

apiVersion: apps/v1
kind: Deployment
metadata:
  (...)
spec:
  replicas: 4
  selector:
    matchLabels:
      app: myapp
  template:
    (...)
    spec:
      nodeSelector:
        pool_type: compute
      (...)
      containers:
        (...)

There is also entry in tolerations for accepting Azure spot instances. It apparently works.

tolerations:
        - key: "kubernetes.azure.com/scalesetpriority"
          operator: "Equal"
          value: "spot"
          effect: "NoSchedule"

The problem is that the app gets deployed only on one nodepool ("computespot" in this case) and never touches the another (computebase). Even when the label and the size of the individual nodes are same.

How this can be solved?


Solution

  • Found a solution using pod affinity.

    spec:
          # This didn't work:
          #
          # nodeSelector:
          #   pool_type: compute 
          # 
          # But this does:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                  - matchExpressions:
                    - key: pool_type
                      operator: In
                      values:
                      - compute 
    

    I don't know the reason because we're still dealing with one single label. If someone knows, please share.