parallel-processingproject-loom

May I have Project Loom Clarified?


Brian Goetz got me excited about project Loom and, in order to fully appreciate it, I'll need some clarification on the status quo.

My understanding is as follows: Currently, in order to have real parallelism, we need to have a thread per cpu/core.

  1. Is there then any point in having n+1 threads on an n-core machine? Project Loom will bring us virtually limitless threads/fibres, by relying on the jvm to carry out a task on a virtual thread, inside the JVM.
  2. Will that be truly parallel?
  3. How, specifically, will that differ from the aforementioned scenario "n+1 threads on an n-core machine "?

Thanks for your time.


Solution

  • Two good Answers already. Here are a few more thoughts.

    Project Loom will bring us virtually limitless threads/fibres,

    Yes, having a million threads running simultaneous on conventional computer hardware becomes practical with Project Loom’s virtual threads.

    See JEP 444. See also video of presentations by Ron Pressler, Alan Bateman, and José Paumard.

    by relying on the jvm to carry out a task on a virtual thread, inside the JVM.

    No, not inside the JVM.

    The JVM is running in its own thread. Any tasks you run on additional threads are always executing on a separate platform thread (a thread created and managed by the host operating system).

    A Java virtual thread when executing has been assigned to a platform thread — sometimes called mounting a virtual thread onto a platform thread, like a rider mounted on a horse.

    The trick with virtual threads is that the JVM can detect when Java code on a platform thread becomes blocked (waiting for I/O etc.), and if that platform thread is hosting a mounted virtual thread, the JVM can rapidly switch out that blocked virtual thread for another virtual thread. In other words, the JVM dismounts the blocked virtual thread from the platform thread, and then mounts a different pending virtual thread for execution. When the first virtual thread is ready to work again, the JVM dismounts & mounts the virtual threads from/to a platform thread again.

    If you really want to know the nitty-gritty of how this rapid swapping out of virtual threads (the dismounting/mounting) happens, see the presentation by Ron Pressler on continuations. However, understanding this topic is entirely optional, unneeded for day-to-day use of virtual threads.

    Will that be truly parallel?

    In theory, yes. If by "parallel" you mean two or more tasks simultaneously executing, than yes, with multiple CPU cores, you can have two or more Java tasks simultaneously executing.

    But in practice, keep in mind that ultimately the host operating system is in charge of scheduling the execution of its threads. The Java programmer, and the JVM, has no way of knowing, or of controlling, which threads get executed when. So your two parallel threads might actually get scheduled alternately, if the host OS chooses to interleave their execution.

    How, specifically, will that differ from the aforementioned scenario "n+1 threads on an n-core machine "?

    Platform threads are “expensive”, both in terms of memory and in demands on the CPU.

    For memory, the host OS typically assigns a fairly large chunk of memory by default to each thread. Since the host OS knows nothing about the particular Java task, it has no way to know what amount of memory might be optimal. Furthermore, typical implementations of host OS threading can grow that assigned memory but not shrink it.

    In contrast, with Java virtual threads, the JVM manages the needed memory within the JVM, optimizing as it goes. And the JVM can grow and shrink the amount of memory being used by the virtual thread. Again, for details see the video talks mentioned above.

    For CPU, the context switching between platform threads is a lot of work, taking time and many CPU cycles. So having many more than the number of cores can overburden the computer.

    In contrast, the JVM can very quickly and easily swap out one virtual thread for another, dismounting & mounting onto the host platform thread. Again, for details see the talk on continuations mentioned above.

    So, typically we are limited to several, or a few dozen platform threads, maybe a couple hundred in some situations. In contrast, having even a million virtual threads at a time can be reasonable on conventional computer hardware.