rustrabbitmqrust-tokiorayon

Strategies for Cancelling Long-Running Rayon Tasks in Rust via RabbitMQ


I'm developing an executor in Rust to handle compute-intensive tasks as part of a larger web application. The executor receives jobs from RabbitMQ (using lapin) and processes various CPU-bound operations, which include some that internally utilize rayon. I'm aware that async can complicate rayon's usage. Post-computation or task cancellation, it communicates results or cancellation status back to RabbitMQ.

The operations in question are out of my control, consisting mainly of long-running, parallel graph algorithms. I'm exploring strategies to cancel these tasks reactively through a separate RabbitMQ cancellation queue without rewriting the underlying libraries due to their (to my understanding) non-compatibility with async patterns and the general lack of built-in task cancellation support for safety reasons.

I've prototyped an executor using tokio, only to recognize the potential issues with rayon after the fact. The current cancellation strategy is based on tokio oneshot channels, which inefficiently drop tasks without liberating CPU resources. I am looking for advice on crafting an idiomatic executor that enables effective task cancellation and conforms to Rust's safety guarantees.

Edit: I‘ll try to be more specific: Essentially, I am looking for a strategy to cancel (long-running, blocking, CPU-bound) tasks from another thread, effectively aborting the computation and freeing up the CPU. Gracefully signaling shutdown using a mechanism such as tokio‘s oneshot channel is not an option since I don‘t have control over the implementation of the long running operation.


Solution

  • The only sound way to stop a thread is for it to cooperate with that. There are multiple ways to do this, but they all involve something pervasive:

    So, given your constraints of calling into code you do not control, the most you can possibly do is let the thread continue until control returns to code you wrote (at which point you can panic!() to cleanly terminate the thread).

    Some ideas for how to proceed:


    You may wonder: why does cancelling a thread corrupt state? Two examples:

    Both of these things could potentially be addressed, by adding suitable checks to them or prohibiting them (e.g. using strictly channels and atomics, no locks), but you'd have to make sure that all of the code the thread executes obeys suitable restrictions, and Rust, being a language without a “VM” or “runtime”, does not impose such restrictions globally.