Small question regarding a Spring Webflux Reactive Cassandra application please.
On a setup Spring Boot 2.6.4 with Webflux and reactive Cassandra, I am using the app to insert some data in Cassandra tables.
Things works fine, until when there is a higher load, I am seeing an issue (stack trace attached)
The thing is, reading some documentation, I thought this piece of code would help solve the issue:
@Bean
public DriverConfigLoaderBuilderCustomizer defaultProfile(){
return builder -> builder
.withInt(DefaultDriverOption.METADATA_SCHEMA_REQUEST_TIMEOUT, 5000)
.build();
}
(here is my Cassandra Config, note I am using CqlSession.builder()
)
protected CqlSession cqlLocalSession() {
return CqlSession.builder()
.addContactEndPoint(new DefaultEndPoint(new InetSocketAddress(contactPoints, port)))
.withKeyspace(keyspace)
.withLocalDatacenter(datacenter)
.withAuthCredentials(username, passPhrase)
.build();
}
Unfortunately, the DriverConfigLoaderBuilderCustomizer
does not seem to work, as I am still facing Query timed out after PT2S; (I would have at least expected to see PT5S.
May I ask what is the issue, and how to properly increase the timeout please?
Thank you
org.springframework.data.cassandra.CassandraUncategorizedException: ReactivePreparedStatementCallback; CQL [INSERT INTO mutable (x,y,z) VALUES (?,?,?)]; Query timed out after PT2S; nested exception is com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT2S
at org.springframework.data.cassandra.core.cql.CassandraExceptionTranslator.translate(CassandraExceptionTranslator.java:162) ~[spring-data-cassandra-3.3.2.jar!/:3.3.2]
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed at the following site(s):
*__checkpoint ⇢ Handler myHandler [DispatcherHandler]
Original Stack Trace:
at org.springframework.data.cassandra.core.cql.CassandraExceptionTranslator.translate(CassandraExceptionTranslator.java:162) ~[spring-data-cassandra-3.3.2.jar!/:3.3.2]
at org.springframework.data.cassandra.core.cql.ReactiveCassandraAccessor.translate(ReactiveCassandraAccessor.java:149) ~[spring-data-cassandra-3.3.2.jar!/:3.3.2]
at org.springframework.data.cassandra.core.cql.ReactiveCqlTemplate.lambda$translateException$15(ReactiveCqlTemplate.java:820) ~[spring-data-cassandra-3.3.2.jar!/:3.3.2]
at reactor.core.publisher.Flux.lambda$onErrorMap$28(Flux.java:6911) ~[reactor-core-3.4.15.jar!/:3.4.15]
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94) ~[reactor-core-3.4.15.jar!/:3.4.15]
at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onError(ScopePassingSpanSubscriber.java:96) ~[spring-cloud-sleuth-instrumentation-3.1.1.jar!/:3.1.1]
at reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain.onError(MonoFlatMapMany.java:204) ~[reactor-core-3.4.15.jar!/:3.4.15]
at reactor.core.publisher.FluxContextWrite$ContextWriteSubscriber.onError(FluxContextWrite.java:121) ~[reactor-core-3.4.15.jar!/:3.4.15]
at org.springframework.cloud.sleuth.instrument.reactor.ScopePassingSpanSubscriber.onError(ScopePassingSpanSubscriber.java:96) ~[spring-cloud-sleuth-instrumentation-3.1.1.jar!/:3.1.1]
at reactor.core.publisher.MonoCompletionStage.lambda$subscribe$0(MonoCompletionStage.java:80) ~[reactor-core-3.4.15.jar!/:3.4.15]
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[na:na]
at com.datastax.oss.driver.internal.core.cql.CqlPrepareAsyncProcessor.lambda$process$1(CqlPrepareAsyncProcessor.java:71) ~[java-driver-core-4.13.0.jar!/:na]
at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[na:na]
at com.datastax.oss.driver.internal.core.cql.CqlPrepareHandler.setFinalError(CqlPrepareHandler.java:320) ~[java-driver-core-4.13.0.jar!/:na]
at com.datastax.oss.driver.internal.core.cql.CqlPrepareHandler.lambda$scheduleTimeout$1(CqlPrepareHandler.java:163) ~[java-driver-core-4.13.0.jar!/:na]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) ~[netty-common-4.1.74.Final.jar!/:4.1.74.Final]
at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) ~[netty-common-4.1.74.Final.jar!/:4.1.74.Final]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) ~[netty-common-4.1.74.Final.jar!/:4.1.74.Final]
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) ~[netty-common-4.1.74.Final.jar!/:4.1.74.Final]
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) ~[netty-common-4.1.74.Final.jar!/:4.1.74.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.74.Final.jar!/:4.1.74.Final]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
Caused by: com.datastax.oss.driver.api.core.DriverTimeoutException: Query timed out after PT2S
at com.datastax.oss.driver.internal.core.cql.CqlPrepareHandler.lambda$scheduleTimeout$1(CqlPrepareHandler.java:163) ~[java-driver-core-4.13.0.jar!/:na]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) ~[netty-common-4.1.74.Final.jar!/:4.1.74.Final]
at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) ~[netty-common-4.1.74.Final.jar!/:4.1.74.Final]
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) ~[netty-common-4.1.74.Final.jar!/:4.1.74.Final]
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) ~[netty-common-4.1.74.Final.jar!/:4.1.74.Final]
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) ~[netty-common-4.1.74.Final.jar!/:4.1.74.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.74.Final.jar!/:4.1.74.Final]
at java.base/java.lang.Thread.run(Thread.java:829) ~[na:na]
You have configured the wrong option on the driver. METADATA_SCHEMA_REQUEST_TIMEOUT
is the timeout for the requests to retrieve the schema.
The default request timeout for the Java driver is basic.request.timeout
(see reference configuration):
datastax-java-driver {
basic.request {
timeout = 2 seconds
}
}
The equivalent typed driver option for basic.request.timeout
is REQUEST_TIMEOUT
(reference TypedDriverOption.java).
I should point out that increasing the request timeout is almost always never the correct way of handling the issue you're facing because you're just hiding the problem without actually addressing the root cause.
As you already pointed out at the start of your post, the problem arises when the load is higher. The reason you're getting DriverTimeoutException
is the coordinators do not respond in time and it's not difficult to conclude that it is due to nodes being overloaded.
Writes in Cassandra are very fast because the mutations are appended to the end of the commitlog
-- there are no disk seeks. If the disk for the commitlog
is slow or cannot keep up with writes, you need to look at:
commitlog
is on a separate disk/volume so it's not competing for the same IO as reads,