concurrencydisruptor-patternwork-stealing

When to use the disruptor pattern and when local storage with work stealing?


Is the following correct?

(And is the disruptor pattern still so much faster than other lockless multi-producer multi-consumer queues (e.g. from boost) when multiple producers (i.e. CAS operations) are involved?)


My situation in detail:

Processing an entry can produce several new entries, which must be processed eventually, too. Performance has highest priority, entries being processed in FIFO order has second priority.

In the current implementation, each thread uses a local FIFO, where it adds its new entries. Idle threads steal work from other thread's local FIFO. Dependencies between the thread's processing are resolved using a lockless, mechanically sympathetic hash table (CASs on write, with bucket granularity). This results in pretty low contention but FIFO order is sometimes broken.

Using the disruptor pattern would guarantee FIFO order. But wouldn't distributing the entries onto the threads cause much higher contention (e.g. CAS on a read cursor) than for local FIFOs with work stealing (each thread's throughput is about the same)?


References I've found

The performance tests in the standard technical paper on the disruptor (Chapter 5 + 6) do not cover disjoint work distribution.

https://groups.google.com/forum/?fromgroups=#!topic/lmax-disruptor/tt3wQthBYd0 is the only reference I've found on disruptor + work stealing. It states that a queue per thread is dramatically slower if there is any shared state, but does not go into detail or explain why. I doubt that this sentence applies to my situation with:


Solution

  • Update - Bottom line up front for max performance: You need to write both in the idiomatic syntax for disruptor and work stealing, and then benchmark.

    To me, I think the distinction is primarily in the split between message vs task focus, and therefore in the way you want to think of the problem. Try to solve your problem, and if it is task-focused then Disruptor is a good fit. If the problem is message focused, then you might be more suited to another technique such as work stealing.

    Of course, you can change your implementation to suit each approach better.

    In your case, I would structure the problem differently if you wanted to use Disruptor. Generally you would eliminate shared state by having a single thread own the state and pass all tasks through that thread of work - look up SEDA for lots of diagrams like this. This can have lots of benefits, but again, is really down to your implementation.

    Some more verbosity: