proxyistioenvoyproxyservicemesh

How to make a circuit-breaker in Istio?


I am trying to configure a circuit breaker in Istio. This is the yaml.

trafficPolicy:
    connectionPool:
      http:
        http1MaxPendingRequests: 1
        maxRequestsPerConnection: 1
      tcp:
        maxConnections: 1
    outlierDetection:
      baseEjectionTime: 1m
      consecutive5xxErrors: 1
      interval: 1s

I have a list of thread groups in JMeter that will be continously hitting the service associated with the above circuit breaker. Upon receiving an error response, it should be making the service unavailable for 1 minute. But, that is not happenning.

Am I misunderstanding how it works? Is there any way to achieve that?


Solution

  • I think you are confusing between outlier detection and circuit breaker based on connectionPool settings.

    The settings you are applying in the connectionPool will configure a circuit breaker where if any of the limits are breached then circuit will be tripped and new requests will get an immideate 503 response from istio proxy. As in the new requests will not be sent to the application. However, the proxy will accept new requests as soon as it can (when limits are not breached by accepting the new request). There is no such thing as circuit breaking for 1 minute in this context.

    Outlier detection is different. This works by tripping a particular error prone POD from the load balancing pool. Suppose, you have 4 replica pods running for your deployment. And let us say one of the PODs is giving 5xx error (The 503 errors sent by proxy, like in the connection pool breach case, are not counted here. This count is of your application errors). In this case istio will wait for consecutive5xxErrors (1 in your case) and once this is breached it will remove that pod from load balancing for the baseEjectionTime for the first time. That is, it will wait for baseEjectionTime (1m in your case). Till then no new request will be sent to the error proned POD. After 1 minute it will add the POD again to the load balancing pool. But if again this POD breaches the consecutive5xxErrors (1 in your case) then istio will remove it from the load balancing for 2xbaseEjectionTime which would be 2 minutes in your case. This will keep going until your POD is back giving non 5XX errors.