multithreadingcpu-architecturehyperthreading

SMT and Hyperthreading : threads vs process


I understand SMT in general and the concept of hardware threads(I think). I wanted my understanding to be validated here or corrected. Basically, HW threads are different from the SW threads. We could run different SW threads or even different processes on an SMT core simultaneously, right? SMT core does not differentiate between process1 and process2, to the HW, they are just two threads. Is that correct?


Solution

  • Yes, your understanding is correct: the concept of hardware threads doesn't really relate to the distinction between (OS-level) threads and processes. For example, it doesn't somehow limit two SMT threads to only running software threads from the same process1.

    The use of the term hardware thread is a bit confusing, since thread already had a specific meaning in the software world. As Peter pointed out in the comments, you might prefer logical core instead. So a single hyperthreaded package might have 2 physical cores and 4 logical cores. We refer to that as 2c4t (yes, the t is again for thread).

    It might be easiest to think of this in terms of abstractions. The key abstraction hardware offers to software is the CPU. 15 years ago, your desktop had 1 CPU and was the same as the 1 CPU you'd see under the fan if you opened the case. Today, a single physical package (the thing you see plugged into the socket under the fan) usually appears as multiple CPUs to the operating system.

    In particular, a 2c4t physical CPU will mostly appear as 4 CPUs to the OS. The OS mostly doesn't care that it's 2 physical cores and 4 logical cores, versus 1 physical core and 4 logical (not common on Intel but common elsewhere), or 4 physical cores with 1 logical thread each, or even 4 separate physical CPUs with 1 core each on a big huge server motherboard. The way the hardware implements the presented CPU is only a performance concern, not really a functional one. In user software, for example, when you query the number of CPUs, you really get the total number of hardware threads, no matter how they are physically implemented2.

    So understanding that abstraction helps answer this:

    We could run different SW threads or even different processes on an SMT core simultaneously, right?

    Yes - whatever you could do on 2 physical CPUs, you can do on 2 cores, or 2 logical cores on the same physical core. The abstraction the hardware presents is the same.

    Then there is the question of software processes and threads. This is mostly an abstraction the operating system presents to userland software. The CPU doesn't really have this concept at all: it only offers facilities to provide an "execution context" per CPU to run something, offers a bunch of additional services that modern OSes need, such as various priviledge levels (to implement the user/kernel split), memory protection, interrupts, paging/memory-management unit services, etc.

    The operating system uses that to implement its concept of processes and threads: but the CPU doesn't care. For example, processes usually have separate virtual memory spaces, while threads share them. The CPU supports this concept by having an MMU - but it doesn't have a binary concept of process vs thread: you could very well have something in the middle that shares some part of the memory space, etc. Much of the non-virtual-memory difference between processes and threads is totally outside of the domain of the CPU: such as separate sets of open files, separate permissions and capabilities, working directories, environment variables and so on.

    Understanding the process/thread abstraction helps answer this other part of your question:

    SMT core does not differentiate between process1 and process2, to the HW, they are just two threads[?]

    Correct. Not only does SMT not care about processes versus thread, CPUs in general don't care. They offer some functionality for the OS to set up various sharing arrangements between execution contexts (the memory mapping being the big one) - but they don't care how it is used. You won't even really find discussion of a binary distinction between "process" and "thread" in the system programming manual for a CPU.


    1 That seemed to be one of your concerns, but it wasn't entirely clear.

    2 To be clear, no modern OS will be totally ignorant of the mapping between physical cores and the 1 or more logical cores they contain - among other things, it uses that information to optimize scheduling. For example if you had two processes running on your 2c4t box, it would usually be silly if they were both running on the same physical core, leaving the other idle, since performance will generally be lower that way. This is no different to something like NUMA - where there is the fundamental high level abstraction (single homogeneous shared memory space) alongside the low level performance concerns which leak through the abstraction (not all memory access is uniform). The goal is that the lowest levels of the software stack (OS, threading libraries, memory allocators, etc) mostly handle this stuff so the user software can keep working with the high level abstraction.