azure-aksazure-machine-learning-servicevnetinternal-load-balancer

Provision AKS with internal load balancer from AMLS on Azure


I would like to provision an AKS cluster that is connected to a vnet and has an internal load balancer on Azure. I am using code from here that looks like this:

import azureml.core
from azureml.core.compute import AksCompute, ComputeTarget

# Verify that cluster does not exist already
try:
    aks_target = AksCompute(workspace=ws, name=aks_cluster_name)
    print("Found existing aks cluster")

except:
    print("Creating new aks cluster")

    # Subnet to use for AKS
    subnet_name = "default"
    # Create AKS configuration
    prov_config=AksCompute.provisioning_configuration(load_balancer_type="InternalLoadBalancer")
    # Set info for existing virtual network to create the cluster in
    prov_config.vnet_resourcegroup_name = "myvnetresourcegroup"
    prov_config.vnet_name = "myvnetname"
    prov_config.service_cidr = "10.0.0.0/16"
    prov_config.dns_service_ip = "10.0.0.10"
    prov_config.subnet_name = subnet_name
    prov_config.docker_bridge_cidr = "172.17.0.1/16"

    # Create compute target
    aks_target = ComputeTarget.create(workspace = ws, name = "myaks", provisioning_configuration = prov_config)
    # Wait for the operation to complete
    aks_target.wait_for_completion(show_output = True)

However, I get the following error

K8s failed to assign an IP for Load Balancer after waiting for an hour.

Is this because the AKS cluster does not yet have a 'network contributor' role for the vnet resource group? Is the only way to get this to work to first create AKS outside of AMLS, grant the network contributor role to the vnet resource group, then attach the AKS cluster to AMLS and configure the internal load balancer afterwards?


Solution

  • I was able to get this to work by first creating an AKS resource without an internal load balancer, then separately updating the load balancer following this code:

    import azureml.core
    from azureml.core.compute.aks import AksUpdateConfiguration
    from azureml.core.compute import AksCompute
    
    # ws = workspace object. Creation not shown in this snippet
    aks_target = AksCompute(ws,"myaks")
    
    # Change to the name of the subnet that contains AKS
    subnet_name = "default"
    # Update AKS configuration to use an internal load balancer
    update_config = AksUpdateConfiguration(None, "InternalLoadBalancer", subnet_name)
    aks_target.update(update_config)
    # Wait for the operation to complete
    aks_target.wait_for_completion(show_output = True)
    

    No network contributor role was required.