Many times I've heard that it is better to maintain the number of threads in a thread pool below the number of cores in that system. Having twice or more threads than the number of cores is not only a waste, but also could cause performance degradation.
Are those true? If not, what are the fundamental principles that debunk those claims (specifically relating to java)?
Many times I've heard that it is better to maintain the number of threads in a thread pool below the number of cores in that system. Having twice or more threads than the number of cores is not only a waste, but also could cause performance degradation.
The claims are not true as a general statement. That is to say, sometimes they are true (or true-ish) and other times they are patently false.
A couple things are indisputably true:
More threads means more memory usage. Each thread requires a thread stack. For recent HotSpot JVMs, the minimum thread stack size is 64Kb, and the default can be as much as 1Mb. That can be significant. In addition, any thread that is alive is likely to own or share objects in the heap whether or not it is currently runnable. Therefore it is reasonable to expect that more threads means a larger memory working set.
A JVM cannot have more threads actually running than there are cores (or hyperthread cores or whatever) on the execution hardware. A car won't run without an engine, and a thread won't run without a core.
Beyond that, things get less clear cut. The "problem" is that a live thread can in a variety of "states". For instance:
The "one thread per core" heuristic assumes that threads are either running or runnable (according to the above). But for a lot of multi-threaded applications, the heuristic is wrong ... because it doesn't take account of threads in the other states.
Now "too many" threads clearly can cause significant performance degradation, simple by using too much memory. (Imagine that you have 4Gb of physical memory and you create 8,000 threads with 1Mb stacks. That is a recipe for virtual memory thrashing.)
But what about other things? Can having too many threads cause excessive context switching?
I don't think so. If you have lots of threads, and your application's use of those threads can result in excessive context switches, and that is bad for performance. However, I posit that the root cause of the context switched is not the actual number of threads. The root of the performance problems are more likely that the application is:
Object.notifyAll()
when Object.notify()
would be better, OR(In the last case, the bottleneck is likely to be the I/O system rather than context switches ... unless the I/O is IPC with services / programs on the same machine.)
The other point is that in the absence of the confounding factors above, having more threads is not going to increase context switches. If your application has N runnable threads competing for M processors, and the threads are purely computational and contention free, then the OS'es thread scheduler is going to attempt to time-slice between them. But the length of a timeslice is likely to be measured in tenths of a second (or more), so that the context switch overhead is negligible compared with the work that a CPU-bound thread actually performs during its slice. And if we assume that the length of a time slice is constant, then the context switch overhead will be constant too. Adding more runnable threads (increasing N) won't change the ratio of work to overhead significantly.
In summary, it is true that "too many threads" is harmful for performance. However, there is no reliable universal "rule of thumb" for how many is "too many". And (fortunately) you generally have considerable leeway before the performance problems of "too many" become significant.