javaspring-webfluxreactive-programmingproject-reactor

How to find the root cause / exception stacktrace of signal type CANCEL


In doFinally have printed the signal type to finally verify that it was a cancellation (CANCEL). However, this is not captured as part of doOnError as technically cancellation is not part of error.

Root causes I can see why cancellation occurs is because

  1. Timeouts: If you use .timeout(), cancellation will occur if the timeout elapses.
  2. Manual disposal: If you call .dispose() or .cancel() on the subscription.
  3. Application shutdown: If the app or thread is stopped.
  4. Backpressure: If downstream cancels due to resource limits.

However, unlike doOnError(Throwable t -> {}) - doOnCancel(()->{}) does not provide the stacktrace to find the root cause of which operation caused this cancel.

This is a complex code where multiple library methods are invoked so its not so direct to trace it via the code. The cancellation is happening only in our production environment and in no other environments. Our primary suspect is the resource and memory limits.

But is there a way to trace the root cause of the cancel signal - like which method or operation caused this?

I'm not sure if using Hooks.onOperatorDebug() would be wise.


Solution

  • Found the root cause.

    The operation was being cancelled because the http method in the controller was returning the reactive method's output directly instead of subscribing to it explicitly.

    Controller was like

    PostMapping("/mymethod")
    public Mono<Void> myMethod(InputParams ip) {
    return reactorService.method(); // making it sync subscription
    }
    

    This made the http response sync awaiting the reactor response and cancelling it upon http timeout.

    The issue was resolved by invoking the reactor and subscribing to it from the controller and returning a dummy http success response.

    PostMapping("/mymethod")
    public ResponseEntity<MyType> myMethod(InputParams ip) {
    reactorService.method(ip).subscribe(); // executes async
    return ResponseEntity.ok(); // or MyResponseBuilder.setInputs(ip).build(); 
    }
    

    This way the http response returned early and the reactor executed async - resolving the issue.