multithreadingcpu-architecturehyperthreading

SMT and Hyperthreading : threads vs process


I understand SMT in general and the concept of hardware threads(I think). I wanted my understanding to be validated here or corrected. Basically, HW threads are different from the SW threads. We could run different SW threads or even different processes on an SMT core simultaneously, right? SMT core does not differentiate between process1 and process2, to the HW, they are just two threads. Is that correct?


Solution

  • Yes, your understanding is correct: the concept of hardware threads doesn't really relate to the distinction between (OS-level) threads and processes. For example, it doesn't somehow limit two SMT threads to only running software threads from the same process1.

    The use of the term hardware thread is a bit confusing, since thread already had a specific meaning in the software world. As Peter pointed out in the comments, you might prefer logical core instead. So a single hyperthreaded package might have 2 physical cores and 4 logical cores. We refer to that as 2c4t (yes, the t is again for thread).

    It might be easiest to think of this in terms of abstractions. The key abstraction hardware offers to software is the CPU. 15 years ago, your desktop had 1 CPU and was the same as the 1 CPU you'd see under the fan if you opened the case. Today, a single physical package (the thing you see plugged into the socket under the fan) usually appears as multiple CPUs to the operating system.

    In particular, a 2c4t physical CPU will mostly appear as 4 CPUs to the OS. The OS mostly doesn't care that it's 2 physical cores and 4 logical cores, versus 1 physical core and 4 logical (not common on Intel but common elsewhere), or 4 physical cores with 1 logical thread each, or even 4 separate physical CPUs with 1 core each on a big huge server motherboard. The way the hardware implements the presented CPU is only a performance concern, not really a functional one. In user software, for example, when you query the number of CPUs, you really get the total number of hardware threads, no matter how they are physically implemented2.

    So understanding that abstraction helps answer this:

    We could run different SW threads or even different processes on an SMT core simultaneously, right?

    Yes - whatever you could do on 2 physical CPUs, you can do on 2 cores, or 2 logical cores on the same physical core. The abstraction the hardware presents is the same.

    Then there is the question of software processes and threads. This is mostly an abstraction the operating system presents to userland software. The CPU doesn't really have this concept at all: it only offers facilities to provide an "execution context" per CPU to run something, offers a bunch of additional services that modern OSes need, such as various priviledge levels (to implement the user/kernel split), memory protection, interrupts, paging/memory-management unit services, etc.

    The operating system uses that to implement its concept of processes and threads: but the CPU doesn't care. For example, processes usually have separate virtual memory spaces, while threads share them. The CPU supports this concept by having an MMU - but it doesn't have a binary concept of process vs thread: you could very well have something in the middle that shares some part of the memory space, etc. Much of the non-virtual-memory difference between processes and threads is totally outside of the domain of the CPU: such as separate sets of open files, separate permissions and capabilities, working directories, environment variables and so on.

    Understanding the process/thread abstraction helps answer this other part of your question:

    SMT core does not differentiate between process1 and process2, to the HW, they are just two threads[?]

    At the CPU level there is likely to be a difference between two threads in the same process and threads from different processes: e.g., threads within a process share an address space and hence use the same page tables, and can share TLB entries, and virtually indexed cache entries, should those exist, etc. Threads from different processes have different page tables (at least as far as the CPU is concerned).

    On CPUs that I'm aware of that applies whether or not the software threads are running on sibling hardware threads or district physical cores though Peter Cordes points out that one possible design point would be an SMT core which places additional restrictions on the page tables for the running siblings, e.g., that they be identical, i.e., that only threads from the same process can run concurrently on a physical core.


    1 That seemed to be one of your concerns, but it wasn't entirely clear.

    2 To be clear, no modern OS will be totally ignorant of the mapping between physical cores and the 1 or more logical cores they contain - among other things, it uses that information to optimize scheduling. For example if you had two processes running on your 2c4t box, it would usually be silly if they were both running on the same physical core, leaving the other idle, since performance will generally be lower that way. This is no different to something like NUMA - where there is the fundamental high level abstraction (single homogeneous shared memory space) alongside the low level performance concerns which leak through the abstraction (not all memory access is uniform). The goal is that the lowest levels of the software stack (OS, threading libraries, memory allocators, etc) mostly handle this stuff so the user software can keep working with the high level abstraction.