spring-bootmicroservicescircuit-breakerretry-logicresilience4j-retry

what is the difference between Circuit Breaker and Retry in spring boot microservice?


One of my colleagues asked me this question what the difference between Circuit Breaker and Retry is but I was not able answer him correctly. All I know circuit breaker is useful if there is heavy request payload, but this can be achieve using retry. Then when to use Circuit Breaker and when to Retry.

Also, it is it possible to use both on same API?


Solution

  • Several years ago I wrote a resilience catalog to describe different mechanisms. Originally I've created this document for co-workers and then I shared it publicly. Please allow me to quote here the relevant parts.

    Retry

    Categories: reactive, after the fact

    The relation between retries and attempts: n retries means at most n+1 attempts. The +1 is the initial request, if it fails (for whatever reason) then retry logic kicks in. In other words, the 0th step is executed with 0 delay penalty.

    There are situation where your requested operation relies on a resource, which might not be reachable in a certain point of time. In other words there can be a temporal issue, which will be gone sooner or later. This sort of issues can cause transient failures. With retries you can overcome these problems by attempting to redo the same operation in a specific moment in the future. To be able to use this mechanism the following criteria group should be met:

    Let’s review them one by one:

    Circuit Breaker

    Categories: proactive, before the fact

    It is hard to categorize the circuit breaker because it is pro- and reactive at the same time. It detects that a given downstream system is malfunctioning (reactive) and it protects the downstream systems from being flooded with new requests (proactive).

    This is one of the most complex patterns mainly because it uses different states to define different behaviours. Before we jump into the details lets see why this tool exists at all:

    Circuit breaker detects failures and prevents the application from trying to perform the action that is doomed to fail (until it is safe to retry) - Wikipedia

    So, this tool works as a mini data and control plane. The requests go through this proxy, which examines the responses (if any) and it counts subsequent failures. If a predefined threshold is reached then the transfer is suspended temporarily and it fails immediately.

    It prevents cascading failures. In other words the transient failure of a downstream system should not be propagated to the upstream systems. By concealing the failure we are actually preventing a chain reaction (domino effect) as well.

    It must somehow determine when would be safe to operate again as a proxy. For example it can use the same detection mechanism that was used during the original failure detection. So, it works like this: after a given period of time it allows a single request to go through and it examines the response. If it succeeds then the downstream is treated as healthy. Otherwise nothing changes (no request is transferred through this proxy) only the timer is reset.

    The circuit breaker can be in any of the following states: Closed, Open, HalfOpen.

    Resiliency strategy

    The above two mechanisms / policies are not mutually exclusive, on the contrary. They can be combined via the escalation mechanism. If the inner policy can't handle the problem it can propagate one level up to an outer policy.

    When you try to perform a request while the Circuit Breaker is Open then it will throw an exception. Your retry policy could trigger for that and adjust its sleep duration (to avoid unnecessary attempts).

    The downstream system can also inform upstream that it is receiving too many requests with 429 status code. The Circuit Breaker could also trigger for this and use the Retry-After header's value for its sleep duration.

    So, the whole point of this section is that you can define a protocol between client and server how to overcome on transient failures together.