azureistioistio-gateway

Installing istio-ingressgateway using helm into a TKG private cluster in Azure fails - does not get the IP of the existing internal load balancer


I have a TKG 2.1.1 (kubernetes version 1.24.10) cluster deployed in Azure in a private network that already has an internal load balancer provisioned (by the tanzu installer). When attempting to deploy the istio-ingressgateway, the service is stuck in pending.

Install command:

helm install -f values.yaml istio-ingressgateway istio/gateway -n istio-ingress --wait

values.yaml:

service:
  type: LoadBalancer
  ports:
  - name: status-port
    port: 15021
    protocol: TCP
    targetPort: 15021
  - name: http2
    port: 80
    protocol: TCP
    targetPort: 80
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
  annotations: 
    service.beta.kubernetes.io/azure-load-balancer-internal: 'true'

have also attempted to run an upgrade with alterations to the values file. Revision 2:

service:
  type: LoadBalancer
  ports:
  - name: status-port
    port: 15021
    protocol: TCP
    targetPort: 15021
  - name: http2
    port: 80
    protocol: TCP
    targetPort: 80
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
  annotations: 
    service.beta.kubernetes.io/azure-load-balancer-internal: 'true'
    service.beta.kubernetes.io/azure-load-balancer-ipv4: <existing lb ip>

Revision 3:

service:
  type: LoadBalancer
  ports:
  - name: status-port
    port: 15021
    protocol: TCP
    targetPort: 15021
  - name: http2
    port: 80
    protocol: TCP
    targetPort: 80
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
  annotations: 
    service.beta.kubernetes.io/azure-load-balancer-internal: 'true'
    service.beta.kubernetes.io/azure-load-balancer-internal-subnet: app-pln-snet

Regardless of the values use the status returns:

helm status istio-ingressgateway -n istio-ingress
NAME: istio-ingressgateway
LAST DEPLOYED: Thu Jun  1 05:23:31 2023
NAMESPACE: istio-ingress
STATUS: failed
REVISION: 3
TEST SUITE: None
NOTES:
"istio-ingressgateway" successfully installed!

And the service looks like:

 kubectl describe service istio-ingressgateway -n istio-ingress
Name:                     istio-ingressgateway
Namespace:                istio-ingress
Labels:                   app=istio-ingressgateway
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=istio-ingressgateway
                          app.kubernetes.io/version=1.17.2
                          helm.sh/chart=gateway-1.17.2
                          istio=ingressgateway
Annotations:              meta.helm.sh/release-name: istio-ingressgateway
                          meta.helm.sh/release-namespace: istio-ingress
                          service.beta.kubernetes.io/azure-load-balancer-internal: true
                          service.beta.kubernetes.io/azure-load-balancer-internal-subnet: app-pln-snet
Selector:                 app=istio-ingressgateway,istio=ingressgateway
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       100.69.48.176
IPs:                      100.69.48.176
Port:                     status-port  15021/TCP
TargetPort:               15021/TCP
NodePort:                 status-port  32090/TCP
Endpoints:                100.96.1.230:15021
Port:                     http2  80/TCP
TargetPort:               80/TCP
NodePort:                 http2  31815/TCP
Endpoints:                100.96.1.230:80
Port:                     https  443/TCP
TargetPort:               443/TCP
NodePort:                 https  30364/TCP
Endpoints:                100.96.1.230:443
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>
kubectl get service istio-ingressgateway -n istio-ingress -o wide
NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)                                      AGE   SELECTOR
istio-ingressgateway   LoadBalancer   100.69.48.176   <pending>     15021:32090/TCP,80:31815/TCP,443:30364/TCP   42m   app=istio-ingressgateway,istio=ingressgateway

expectation is the the istio-ingressgateway would connect to the existing Azure internal lb and get the IP.


Solution

  • So a couple of changes were made from my initial deployment of TKG 2.1.1 and Istio 1.17.2 in order to get this to work. And to fix the issue I was having I had to destroy the TKG workload cluster and rebuild it.

    The cluster definition yaml used to deploy the workload cluster needed to be altered. The change made was to comment out the values for creating outbound LB.

    ...
    # AZURE_ENABLE_CONTROL_PLANE_OUTBOUND_LB: true
    # AZURE_ENABLE_NODE_OUTBOUND_LB: true
    # AZURE_CONTROL_PLANE_OUTBOUND_LB_FRONTEND_IP_COUNT: 1
    # AZURE_NODE_OUTBOUND_LB_FRONTEND_IP_COUNT: 1
    # AZURE_NODE_OUTBOUND_LB_IDLE_TIMEOUT_IN_MINUTES: 4
    ...
    

    These values told Tanzu to create both an internal LB and and outbound LB for the compute plane. In the end, when installing Istio 1.17.2 via Helm, the ingress gateway creation was not able to reconcile the internal load balancer for the control plane that had already been generated. Is this case, Istio must be allowed to create the internal LB for compute plane in the cluster so you cannot have TKG do that.

    The next aspect of the problem is a mismatch of Azure NSG naming. Because we are deploying into a private cluster configuration on Azure the network, subnets, and nsg already exist. When building it this way Tanzu expects the nsg name for the compute plane snet to be cluster-name-node-nsg and it must reside in the resource group with the vnet/snets. However when Istio attempts to build the internal LB it is looking for an nsg named cluster-name-id-node-nsg and fails this check when it doesn't find it.

    To reconcile this after the cluster has been generated by TKG you can search for the for the internal LB that is created for the control plane in Azure portal. It will be named cluster-name-id-internal-lb. You can then create a new nsg named cluster-name-id-node-nsg with the same id as in the lb resource. The new nsg must be in the same resource group as the vnet. And, you must assign it to the compute plane snet of cluster. This replaces the nsg that was previous set up in the private network in order to install TKG. You also need to ensure it has the same rules and the nsg it is replacing.

    Once the new nsg is in place, Istio will create a new LB cluster-name-internal with the compute plane as the back end and the service will get the private IP. You only need to pass these values in the values.yaml with the helm install from the question for that to work:

    service:
      type: LoadBalancer
      ports:
      - name: status-port
        port: 15021
        protocol: TCP
        targetPort: 15021
      - name: http2
        port: 80
        protocol: TCP
        targetPort: 80
      - name: https
        port: 443
        protocol: TCP
        targetPort: 443
      annotations: 
        service.beta.kubernetes.io/azure-load-balancer-internal: 'true'