kubernetesavailabilitylivenessprobe

Availability with Kubernetes


We run an internal a healthcheck of the service every 5 seconds. And we run Kubernetes liveness probes every 1 second. So in the worst scenario the Kubernetes loadbalancer has up-to-date information every 6 seconds.

My question is what happens when a client request hits a pod which is broken but not seen by the loadbalancer as unhealthy? Should the client implement a logic with retries? Or should we implement backend logic to handle the cases when a request hits a pod which is not yet seen as unhealthy by the loadbalancer?


Solution

  • Not sure how your architecture is however LoadBalancers are generally set with the ingress controller like Nginx and etc.

    Load Balancer backed by the ingress controller forwards the traffic to the K8s service, and the K8s service mostly manages the request routing to PODs, not LB.

    Based on the Readiness K8s service route the request to PODs, so if your POD is NotReady, the request won't reach there. Due to any delay if the request reaches to that POD there could be a chance you get internal error or so in return.

    Retries

    yes, you implement the retries at the client side but if you are on K8s, you can offload the retries part to the service mesh. it would be easy to maintain and integrate retries logic with the K8s and service mesh.

    You can use the service mesh like Istio and implement the retries policy at virtual service level

    retries:
          attempts: 5
          retryOn: 5xx 
    

    Virtual service

    apiVersion: networking.istio.io/v1alpha3
    kind: VirtualService
    metadata:
      name: ratings
    spec:
      hosts:
      - ratings
      http:
      - route:
        - destination:
            host: ratings
            subset: v1
        retries:
          attempts: 3
          perTryTimeout: 2s
    

    Read more at : https://istio.io/latest/docs/concepts/traffic-management/#retries