In doFinally have printed the signal type to finally verify that it was a cancellation (CANCEL). However, this is not captured as part of doOnError as technically cancellation is not part of error.
Root causes I can see why cancellation occurs is because
However, unlike doOnError(Throwable t -> {}) - doOnCancel(()->{}) does not provide the stacktrace to find the root cause of which operation caused this cancel.
This is a complex code where multiple library methods are invoked so its not so direct to trace it via the code. The cancellation is happening only in our production environment and in no other environments. Our primary suspect is the resource and memory limits.
But is there a way to trace the root cause of the cancel signal - like which method or operation caused this?
I'm not sure if using Hooks.onOperatorDebug() would be wise.
Found the root cause.
The operation was being cancelled because the http method in the controller was returning the reactive method's output directly instead of subscribing to it explicitly.
Controller was like
PostMapping("/mymethod")
public Mono<Void> myMethod(InputParams ip) {
return reactorService.method(); // making it sync subscription
}
This made the http response sync awaiting the reactor response and cancelling it upon http timeout.
The issue was resolved by invoking the reactor and subscribing to it from the controller and returning a dummy http success response.
PostMapping("/mymethod")
public ResponseEntity<MyType> myMethod(InputParams ip) {
reactorService.method(ip).subscribe(); // executes async
return ResponseEntity.ok(); // or MyResponseBuilder.setInputs(ip).build();
}
This way the http response returned early and the reactor executed async - resolving the issue.