multithreadingexecutorservicefork-joinexecutorforkjoinpool

Detailed difference between Java8 ForkJoinPool and Executors.newWorkStealingPool?


What is the low-level difference among using:

ForkJoinPool = new ForkJoinPool(X);

and

ExecutorService ex = Executors.newWorkStealingPool(X);

Where X is the desired level of parallelism, i.e. threads running.

According to the docs, I found them similar. Also, tell me which one is more appropriate and safe under any normal uses.

I have 130 million entries to write into a BufferedWriter and Sort them using Unix sort by the 1st column.

Also let me know how many threads to keep if possible.

Note: My System has 8 core processors and 32 GB RAM.


Solution

  • Work stealing is a technique used by modern thread-pools in order to decrease contention on the work queue.

    A classical threadpool has one queue, and each thread-pool-thread locks the queue, dequeue a task and then unlocks the queue. If the tasks are short and there are many of them, there is a lot of contention on the queue. Using a lock-free queue really helps here, but doesn't solve the problem entirely.

    Modern thread pools use work stealing - each thread has its own queue. When a threadpool thread produces a task - it enqueues it to his own queue. When a threadpool thread wants to dequeue a task - it first tries to dequeue a task out of his own queue and if it doesn't have any - it "steals" work from other thread queues. This really decreases the contention of the threadpool and improves performance.

    newWorkStealingPool creates a workstealing-utilizing thread pool with the number of threads as the number of processors.

    newWorkStealingPool presents a new problem. If I have four logical cores, then the pool will have four threads total. If my tasks block - for example on synchronous IO - I don't utilize my CPUs enough. What I want is four active threads at any given moment, for example - four threads which encrypt AES and another 140 threads which wait for the IO to finish.

    This is what ForkJoinPool provides - if your task spawns new tasks and that task waits for them to finish - the pool will inject new active threads in order to saturate the CPU. It is worth mentioning that ForkJoinPool utilizes work stealing too.

    Which one to use? If you work with the fork-join model or you know your tasks block indefinitely, use the ForkJoinPool. If your tasks are short and are mostly CPU-bound, use newWorkStealingPool.

    And after anything has being said, modern applications tend to use thread pool with the number of processors available and utilize asynchronous IO and lock-free-containers in order to prevent blocking. this (usually) gives the best performance.