spring-cloud-gatewaycircuit-breaker

java.util.concurrent.TimeoutException in Spring Cloud Gateway Circuit Breaker


After successfully applying Route function of Spring Cloud Gateway, I tried to add Circuit Breaker for handling connection timeout to downstream services. However it behaves somewhat different from what I expect.

Version info: (have to use Java8)

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.7.3</version>
    </parent>

    <properties>
        <java.version>1.8</java.version>
        <spring-cloud.version>2021.0.5</spring-cloud.version>
    </properties>

application.yml to reproduce the situation:

spring:
  cloud:
    gateway:
      routes:
        - id: test
          order: 0
          uri: http://{ANY_SITE_CAUSING_CONNECTION_TIMEOUT}:8080/
          predicates:
            - Path=/**
          filters:
            - name: CircuitBreaker
              args:
                name: cbTest
                fallbackUri: forward:/fallback

management:
(to check circuit breaker status via actuator, omitted)

resilience4j:
  circuitbreaker:
    configs:
      default:
        registerHealthIndicator: true
        minimumNumberOfCalls: 10
    instances:
      cbTest:
        baseConfig: default

With a fallback method to see the exception:

@RestController
@RequestMapping("/fallback")
public class FallbackController {
    @GetMapping
    public Mono<Void> fallback(ServerWebExchange exchange) {
        Throwable t = exchange.getAttribute(ServerWebExchangeUtils.CIRCUITBREAKER_EXECUTION_EXCEPTION_ATTR);
        Objects.requireNonNull(t).printStackTrace();
        return null;
    }
}

This results unexpected java.util.concurrent.TimeoutException as follows, and the call finishes in 1 second instead of trying connection.

java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'circuitBreaker' (and no fallback has been configured)
    at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:295)
    at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:280)
    at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:419)
    at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
    at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:271)
    at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:286)
    at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
    at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
    at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
    at java.util.concurrent.FutureTask.run(FutureTask.java)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

And Circuit Breaker records the corresponding event.

http://localhost:8080/actuator/circuitbreakerevents

{
    "circuitBreakerEvents": [
        {
            "circuitBreakerName": "cbTest",
            "type": "ERROR",
            "creationTime": "2023-02-14T10:21:12.633+09:00[Asia/Seoul]",
            "errorMessage": "java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'circuitBreaker' (and no fallback has been configured)",
            "durationInMs": 1000,
            "stateTransition": null
        }
    ]
}

Without "filters" part of route config, I can see ConnectionTimeout and client tries to connect for 30 seconds as expected. So I suspect this as a the matter of CircuitBreaker filter.

io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: /{ANY_SITE_CAUSING_CONNECTION_TIMEOUT}:8080
    Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: 
Error has been observed at the following site(s):
    *__checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]
    *__checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain]
    *__checkpoint ⇢ HTTP GET "/x" [ExceptionHandlingWebHandler]
Original Stack Trace:
Caused by: java.net.ConnectException: Connection timed out: no further information
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_292]
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) ~[na:1.8.0_292]
    at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) ~[netty-transport-4.1.79.Final.jar:4.1.79.Final]
    at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[netty-transport-4.1.79.Final.jar:4.1.79.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710) ~[netty-transport-4.1.79.Final.jar:4.1.79.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) ~[netty-transport-4.1.79.Final.jar:4.1.79.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) ~[netty-transport-4.1.79.Final.jar:4.1.79.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[netty-transport-4.1.79.Final.jar:4.1.79.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.79.Final.jar:4.1.79.Final]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.79.Final.jar:4.1.79.Final]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.79.Final.jar:4.1.79.Final]
    at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_292]

I wonder why netty exception is overwritten by java.util.concurrent.TimeoutException with 1000ms interval. I tried to set spring.cloud.gateway.httpclient.connect-timeout and spring.cloud.gateway.httpclient.response-timeout to bigger values, but it didn't change the result. Per-route timeout settings also didn't work.


Solution

  • For me I just used the following time-limiter configuration:

    resilience4j:
       timelimiter:
         configs:
           default:
             timeout-duration: 5s
             cancel-running-future: false
    

    or

    @Bean
    public TimeLimiterRegistry timeLimiterRegistry() {
       return TimeLimiterRegistry.of(TimeLimiterConfig.custom()
                .timeoutDuration(Duration.ofSeconds(5))
                .cancelRunningFuture(false)
                .build());
    }
    

    or

    @Bean
    public Customizer<ReactiveResilience4JCircuitBreakerFactory> defaultCustomizer() {
        return factory -> factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
                .timeLimiterConfig(TimeLimiterConfig.custom()
                        .timeoutDuration(Duration.ofSeconds(5))
                        .cancelRunningFuture(false)
                        .build())
                .build());
    }