azureterraformazure-aksterraform-provider-azureazure-load-balancer

With Terraform, how do I integrate a basic-sku load balancer and basic-sku public ip address with an azurerm_kubernetes_cluster resource?


Provisioning a minimal aks cluster with azurerm_kubernetes_cluster with defaults creates a load_balancer_sku = standard along with a load_balancer_profile block. A LoadBalancer and IP Address is then visible in the resources list at portal.azure.com.

However, setting load_balancer_sku = basic creates "nothing" - no mention of it in terraform state (outside of label), no load_balancer_profile block, and no resources are created in portal.azure.com. (I also cannot find any information about how to correctly create an aks cluster that works with basic-sku azurerm_lb and basic-sku azurerm_public_ip, so links to comprehensive resources are appreciated.)

From the bits and pieces of info I did find, I glued together the following code:

# azurerm_kubernetes_cluster
resource "azurerm_kubernetes_cluster" "k8s" {
  location            = azurerm_resource_group.rg.location
  name                = var.name
  resource_group_name = azurerm_resource_group.rg.name
  node_resource_group = "${local.node_resource_group_name}"
  dns_prefix          = "${local.dns_prefix}"

  identity {
    type = "SystemAssigned"
  }

  default_node_pool {
    name       = "agentpool"
    vm_size    = "Standard_B2s"
    node_count = 1
  }

  network_profile {
    network_plugin = "kubenet"
    load_balancer_sku = "basic" # ref: https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-overview#pricing-and-sla
  }
}

# azurerm_public_ip
resource "azurerm_public_ip" "k8s_lb_ip" {
  name                = "PublicIPForLB"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_kubernetes_cluster.k8s.node_resource_group # NOTE - is this really required? I do not want the IP in the NRG as it will be destroyed if azure updates the node pool!
  allocation_method   = "Static"
}

# azurerm_lb
resource "azurerm_lb" "k8s_lb" {
  name                = "BasicLoadBalancer"
  location            = azurerm_resource_group.rg.location
  resource_group_name = azurerm_resource_group.rg.name

  frontend_ip_configuration {
    name                 = "PublicIPAddress"
    public_ip_address_id = azurerm_public_ip.k8s_lb_ip.id
  }
}

# azurerm_role_assignment
# ref: https://learn.microsoft.com/en-us/azure/aks/static-ip?source=recommendations#create-a-service-using-the-static-ip-address
resource "azurerm_role_assignment" "lb_ip_aks_integration" {
  description = "Allows a basic-sku load balancer and basic-sku Public IP integration with an AKS cluster whose load_balancer_sku = basic."
  principal_id                     = azurerm_kubernetes_cluster.k8s.kubelet_identity[0].object_id
  role_definition_name             = "Network Contributor"
  scope                            = azurerm_public_ip.k8s_lb_ip.id
  skip_service_principal_aad_check = true

The last stanza uses azurerm_role_assignment. This Microsoft Learn article discusses setting up a standard LB with azure-cli. It implies that azurerm_public_ip.resource_group_name must be set to the cluster's node resource group name. This is not ideal, so I hope this isn't true. I want the IP address to existing in the resource group name so that it's not destroyed if the auto-managed nrg is. The scope is inferred from that azure-cli command, and the principal_id, I referenced how to use via the official Terraform example on how to integrate ACR and AKS with the cluster's default SystemAssigned identity.

In any case, still no luck. If I add the following resource, for example

resource "helm_release" "nginx_ingress" {
  name = "nginx-ingress-controller"

  repository = "https://charts.bitnami.com/bitnami"
  chart      = "nginx-ingress-controller"

  set {
    name  = "controller.service.externalTrafficPolicy"
    value = "Local"
  }
  set {
      name        = "controller.service.loadBalancerIP"
      value = var.public_ip # azurerm_public_ip.ip_address
   }
}

A separate IP address and load balancer are created automatically, unrelated and in addtion to those I specified in the above code.

What am I missing? How do I successfully get the resources I provisioned to coordinate with each other as expected?


Solution

  • The below allows for a basic sku LB and IP to be used within an AKS cluster whose load_balancer_sku = basic.

    The answer is that the above pattern is obsolete. We used it to establish a consistent public IP address that would never change (for DNS and other purposes), and we attached it to a custom load balancer.

    This allowed reconfiguring of the load balancer in a way that would not destroy the entire cluster. The load balancer and custom public ip could exist in its own resource group, and giving the SystemAssigned managed identity access to it via "Contributor" allowed it all to work.

    The documentation states that Node Resource Group resources, which are auto-managed by Microsoft, cannot be altered. And now, per Azurerm provider 3.0, this includes the Load Balancer. In other words, the LB is now automatically created and included in the NRG, so it is no longer possible to create a load balancer resource that works with an aks cluster.

    However, you can still create a custom Public IP address, which, with the right configuration, can be used with the DNS zones, etc. The pattern now is to create a custom public ip address and assign it to the auto-managed NRG. Then, you are forced to use Microsoft's annotations to implement it, like this:

    apiVersion: v1
    kind: Service
    metadata:
      annotations:
        service.beta.kubernetes.io/azure-load-balancer-resource-group: <node resource group name>
        service.beta.kubernetes.io/azure-pip-name: myAKSPublicIP
      name: azure-load-balancer
    spec:
      type: LoadBalancer
      ports:
      - port: 80
      selector:
        app: azure-load-balancer
    

    From a terraform perspective, here is what it all looks like put together:

    # 1.create the resource group
    
    # az group create --name myNetworkResourceGroup --location eastus
    resource "azurerm_resource_group" "rg" {
      name     = "${var.name}-rg"
      location = var.location
    }
    
    # 2. create the BASIC cluster
    # az aks create --name myAKSCluster --resource-group myNetworkResourceGroup --generate-ssh-keys --tier free --node-count 1
    resource "azurerm_kubernetes_cluster" "k8s" {
      location            = azurerm_resource_group.rg.location
      name                = var.name
      resource_group_name = azurerm_resource_group.rg.name
      node_resource_group = local.node_resource_group_name
      dns_prefix          = local.dns_prefix
    
      identity {
        type = "SystemAssigned"
      }
    
      default_node_pool {
        name       = "agentpool"
        vm_size    = "Standard_B2s"
        node_count = 1
      }
    
      network_profile {
        network_plugin    = "kubenet"
        load_balancer_sku = "basic" # var.sku naming error # ref: https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-overview#pricing-and-sla
      }
    }
    
    # create the custom public ip address and insert it into the NRG
    resource "azurerm_public_ip" "k8s_lb_ip" {
      name                = "PublicIPForLB"
      location            = azurerm_resource_group.rg.location
      resource_group_name = azurerm_kubernetes_cluster.k8s.node_resource_group # NOTE IMPORTANT THIS MUST BE THE NODE RESOURCE GROUP NAME!
      allocation_method   = "Static"
      sku                 = "Basic"
    }
    
    # assign the kublet id role to the node resource group id itself (which seems counter intuitive)
    # ref: https://learn.microsoft.com/en-us/azure/aks/static-ip?source=recommendations#create-a-service-using-the-static-ip-address
    resource "azurerm_role_assignment" "lb_ip_aks_integration" {
      description          = "Allows a basic-sku load balancer and basic-sku Public IP integration with an AKS cluster whose load_balancer_sku = basic."
      principal_id         = azurerm_kubernetes_cluster.k8s.kubelet_identity[0].object_id
      role_definition_name = "Network Contributor"
      scope                            = azurerm_kubernetes_cluster.k8s.node_resource_group_id
      skip_service_principal_aad_check = true
    }
    

    unlike in versions 2.99 and earlier where we explicitely involve the load balancer details, we omit it all together. Instead, because the LB is auto created, we have to use annotations, so in Terraform, like this:

    resource "kubernetes_manifest" "service" {
      depends_on = [module.aks_cluster]
      manifest = yamldecode(<<YAML
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
        service.beta.kubernetes.io/azure-load-balancer-resource-group: autolbtest-nrg
        service.beta.kubernetes.io/azure-pip-name: PublicIPForLB
      name: azure-load-balancer
      namespace: default
    spec:
      type: LoadBalancer
      ports:
      - port: 80
      selector:
        app: azure-load-balancer
    YAML
      )
    }
    
    

    TODO: use interpolation for the public ip name and the name of the resource group into which the public ip is created