After successfully applying Route function of Spring Cloud Gateway, I tried to add Circuit Breaker for handling connection timeout to downstream services. However it behaves somewhat different from what I expect.
Version info: (have to use Java8)
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.7.3</version>
</parent>
<properties>
<java.version>1.8</java.version>
<spring-cloud.version>2021.0.5</spring-cloud.version>
</properties>
application.yml to reproduce the situation:
spring:
cloud:
gateway:
routes:
- id: test
order: 0
uri: http://{ANY_SITE_CAUSING_CONNECTION_TIMEOUT}:8080/
predicates:
- Path=/**
filters:
- name: CircuitBreaker
args:
name: cbTest
fallbackUri: forward:/fallback
management:
(to check circuit breaker status via actuator, omitted)
resilience4j:
circuitbreaker:
configs:
default:
registerHealthIndicator: true
minimumNumberOfCalls: 10
instances:
cbTest:
baseConfig: default
With a fallback method to see the exception:
@RestController
@RequestMapping("/fallback")
public class FallbackController {
@GetMapping
public Mono<Void> fallback(ServerWebExchange exchange) {
Throwable t = exchange.getAttribute(ServerWebExchangeUtils.CIRCUITBREAKER_EXECUTION_EXCEPTION_ATTR);
Objects.requireNonNull(t).printStackTrace();
return null;
}
}
This results unexpected java.util.concurrent.TimeoutException as follows, and the call finishes in 1 second instead of trying connection.
java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'circuitBreaker' (and no fallback has been configured)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:295)
at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:280)
at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:419)
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:79)
at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:271)
at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:286)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
at java.util.concurrent.FutureTask.run(FutureTask.java)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
And Circuit Breaker records the corresponding event.
http://localhost:8080/actuator/circuitbreakerevents
{
"circuitBreakerEvents": [
{
"circuitBreakerName": "cbTest",
"type": "ERROR",
"creationTime": "2023-02-14T10:21:12.633+09:00[Asia/Seoul]",
"errorMessage": "java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 1000ms in 'circuitBreaker' (and no fallback has been configured)",
"durationInMs": 1000,
"stateTransition": null
}
]
}
Without "filters" part of route config, I can see ConnectionTimeout and client tries to connect for 30 seconds as expected. So I suspect this as a the matter of CircuitBreaker filter.
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection timed out: no further information: /{ANY_SITE_CAUSING_CONNECTION_TIMEOUT}:8080
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]
*__checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain]
*__checkpoint ⇢ HTTP GET "/x" [ExceptionHandlingWebHandler]
Original Stack Trace:
Caused by: java.net.ConnectException: Connection timed out: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_292]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) ~[na:1.8.0_292]
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) ~[netty-transport-4.1.79.Final.jar:4.1.79.Final]
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[netty-transport-4.1.79.Final.jar:4.1.79.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710) ~[netty-transport-4.1.79.Final.jar:4.1.79.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658) ~[netty-transport-4.1.79.Final.jar:4.1.79.Final]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584) ~[netty-transport-4.1.79.Final.jar:4.1.79.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) ~[netty-transport-4.1.79.Final.jar:4.1.79.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.79.Final.jar:4.1.79.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.79.Final.jar:4.1.79.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.79.Final.jar:4.1.79.Final]
at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_292]
I wonder why netty exception is overwritten by java.util.concurrent.TimeoutException with 1000ms interval. I tried to set spring.cloud.gateway.httpclient.connect-timeout and spring.cloud.gateway.httpclient.response-timeout to bigger values, but it didn't change the result. Per-route timeout settings also didn't work.
For me I just used the following time-limiter configuration:
resilience4j:
timelimiter:
configs:
default:
timeout-duration: 5s
cancel-running-future: false
or
@Bean
public TimeLimiterRegistry timeLimiterRegistry() {
return TimeLimiterRegistry.of(TimeLimiterConfig.custom()
.timeoutDuration(Duration.ofSeconds(5))
.cancelRunningFuture(false)
.build());
}
or
@Bean
public Customizer<ReactiveResilience4JCircuitBreakerFactory> defaultCustomizer() {
return factory -> factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
.timeLimiterConfig(TimeLimiterConfig.custom()
.timeoutDuration(Duration.ofSeconds(5))
.cancelRunningFuture(false)
.build())
.build());
}