kubernetesgoogle-kubernetes-enginekubernetes-servicekubernetes-networkinghaproxy-ingress

GCP-LB unevenly distributing traffic to HAProxy Ingress Controller Pods


As the title suggests, GCP-LB or the HAProxy Ingress Controller Service which is exposed as type LoadBalancer is distributing traffic unevenly to HAProxy Ingress Controller Pods.

Setup:
I am running the GKE cluster in GCP, and using HAProxy as the ingress controller.
The HAProxy Service is exposed as a type Loadbalancer with staticIP.

YAML for HAProxy service:

apiVersion: v1
kind: Service
metadata:
  name: haproxy-ingress-static-ip
  namespace: haproxy-controller
  labels:
    run: haproxy-ingress-static-ip
  annotations:
    cloud.google.com/load-balancer-type: "Internal"
    networking.gke.io/internal-load-balancer-allow-global-access: "true"
    cloud.google.com/network-tier: "Premium"
    cloud.google.com/neg: '{"ingress": false}'
spec:
  selector:
    run: haproxy-ingress
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  - name: https
    port: 443
    protocol: TCP
    targetPort: 443
  - name: stat
    port: 1024
    protocol: TCP
    targetPort: 1024
  type: LoadBalancer
  loadBalancerIP: "10.0.0.76"                                                                                                                                      

YAML for HAProxy Deployment:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: haproxy-ingress
  name: haproxy-ingress
  namespace: haproxy-controller
spec:
  replicas: 2
  selector:
    matchLabels:
      run: haproxy-ingress
  template:
    metadata:
      labels:
        run: haproxy-ingress 
    spec:
      serviceAccountName: haproxy-ingress-service-account
      containers:
      - name: haproxy-ingress
        image: haproxytech/kubernetes-ingress
        args:
          - --configmap=haproxy-controller/haproxy
          - --default-backend-service=haproxy-controller/ingress-default-backend
        ports:
        - name: http
          containerPort: 80
        - name: https
          containerPort: 443
        - name: stat
          containerPort: 1024
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: run
                      operator: In
                      values: 
                        - haproxy-ingress
                topologyKey: kubernetes.io/hostname

HAProxy ConfigMap:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: haproxy
  namespace: haproxy-controller
data:

Problem:
While debugging some other issue, I found out that the traffic on HAProxy pods has uneven traffic distribution. For e.g. one Pods was receiving 540k requests/sec and another Pod was receiving 80k requests/sec.

On further investigation, it was also found that, new Pods which are started don't start receiving traffic for the next 20-30 mins. And even after that, only a small chunk of traffic is routed through them.

Check the graph below: enter image description here

Another version of uneven traffic distribution. This doesn't seem to be random at all, looks like a weighted traffic distribution: enter image description here

Yet another version of uneven traffic distribution. Traffic from one Pod seems to be shifting towards the other Pod.

enter image description here

What could be causing this uneven traffic distribution and not sending traffic to new pods for a large duration of time?


Solution

  • Kubernetes is integrated with GCP Load Balancer. K8s provides primitives such as ingress and service for users to expose pods through L4/L7 load balancers. Before the introduction of NEGs, the load balancer distributed traffic to VM instances and “kube-proxy” programs iptables to forward traffic to backend pods. This could lead to uneven traffic distribution, unreliable load balancer health check and network performance impact.

    I suggest you use Container native load balancing which allows load balancers to target Kubernetes Pods directly and to evenly distribute traffic to Pods. Using container-native load balancing, load balancer traffic is distributed directly to the Pods which should receive the traffic, eliminating the extra network hop. It also helps with improved health checking since it targets Pods directly, you have visibility into the latency from the HTTP(S) load balancer to Pods. The latency from the HTTP(S) load balancer to each Pod is visible, which was aggregated with node IP-base container-native load balancing. This makes troubleshooting your services at the NEG-level easier.

    Container-native load balancers do not support internal TCP/UDP load balancers or network load balancers, so if you want to use this kind of load balancing, you would have to split services into HTTP(80), HTTPS(443) and TCP(1024). To use this, your cluster must have HTTP load-balancing enabled. GKE clusters have HTTP load-balancing enabled by default; you must not disable it.