The idea is to run a parallel job on a 96-cores machine, with a work stealing ForkJoinPool
.
Below is the code I'm using so far:
import scala.collection.parallel.ForkJoinTaskSupport
import scala.concurrent.forkjoin.ForkJoinPool
val sequence: ParSeq[Item] = getItems().par
sequence.tasksupport = new ForkJoinTaskSupport(new ForkJoinPool())
val results = for {
item <- sequence
res = doSomethingWith(item)
} yield res
Here, sequence
has about 20,000 items. Most items take 2-8 seconds to process, and only about 200 of them take longer, around 40 seconds.
The problem:
Everything runs fine, however, the work-stealing aspect doesn't seem to work well. Here are the expected total CPU load (black) compared to the actual load (blue) over time:
When looking at the CPU activity, it's very clear that less and less cores get used as the job is progressing towards the end. During the last 10 few minutes, only 2 or 3 cores are still busy processing dozens of items sequentially, one after the other.
How comes that the items still in the queue don't get stolen by the other free cores, even when using a ForkJoinPool
, which is supposed to be work-stealing?
https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html
Each worker thread has its internal task queue, which is protected from work stealing from other threads to limit interactions between workers.
This probably explains the behavior you're seeing, especially if the occurrences of long task in your item set isn't random.