I am trying to understand more about parallelism, but I've noticed there are a lot of different terms out there and some seem to mean the same thing while others have a notable difference. So, what are all the different types of parallelism, how do they differ from each other, and do any have specific applications or purposes?
(To keep this more focused, I'm hoping for an answer that provides clarity to all the terminology associated with parallelism, including terms not listed below; technical comparisons between each different type would be nice, but will probably result in this question becoming off-topic - then again, I don't really know, hence the question).
Note:
this is not a question about concurrency and goes beyond the "simple" question: "what is parallelism?", although a clarifying definition might be warranted.
First, I have taken notice of the difference between parallelism and threading, but some of the differences between the following terms are still confusing.
To add clarity to my question here is a list of terms that I have found that are related to parallelism: parallel computing, parallel processing, multithreading, multiprocessing, multicore programming, Hyper-threading (Intel) 2, Simultaneous MultiThreading (SMT) 3, Switch-on-Event MultiThreading 3. (If possible, definitions or references to definitions for each of these terms would also be appreciated).
My very specific question: what is the difference between thread-level parallelism, instruction-level parallelism, and process-level parallelism? (and any other x-level parallelism)?
In a multi-core processor, can parallelism occur within a single core? Is that what Hyper-threading is, and does that require a single core having, for example, two ALU's that can be used in parallel?
Last one: is there a difference between hardware vs software parallelism, aside from the obvious distinction that one happens in hardware while the other in software?
Related resources:
- Process vs Thread,
- Parallelism on a GPU,
- Hyper-threading,
- Concurrency vs Parallelism,
- Hyper-threading and gaming.
While the subject matter is indeed immensely wide, I would try to have this view, even at a risk of making many opponents present their objections of simplifying the subject matter ( but Stack Overflow format does not substitute other sources of complete reference ):
[PARALLEL]
Instruction Level Parallelism - ILP - is the simplest case, the CPU-architecture has designed and "hardwired" this particular form of hardware-based parallelism. Having processors with ILP4 ( 4 instructions executed at once ), or having processors with per-instruction based width of this form of parallel-instruction execution, be it ILP2 for some instructions but ILP1 for some others, again the silicon architecture decides, what can happen indeed in parallel at the instruction level. Some awkward surprises may arise from further details, as memory-controller channels may block ILP-mode in cases, where REG/MEMORY uops will have to wait for a free channel to access the instructed MEMORY.
hardware-threads are the next level of granularity. Given a CPU-core is declared to support two hardware threads, these are the only streams-of-code execution, that may flow in parallel ( if no O/S request comes to instantiate and schedule another thread to get executed, mapped onto one of the available CPU-core hardware-threads ). From the user-perspective, there are O/S tools that permit one to explicitly "nail"-down a process-level-PID / thread-level-PID affinity onto a particular CPU-core(s) and thus limit or even eliminate any "disturbance", so as to move from a "just"-[CONCURRENT]
flow of code-execution closer to a true-[PARALLEL]
one.
We will knowingly skip all the crowds of threads, that are just a tool for latency-masking ( be it on the SIMT / SMX warp-wide GPU-scheduler, or the more relaxed, MIMT O/S-kernel driven multithreading )
software operated distributed-systems parallelism is the one that ought be mentioned for completeness, but it has the principally highest adverse costs from a need to invent, define, implement and operate the setup / coordination in software ( which all causes overheads to grow remarkably ), in the sense of the re-formulated Amdahl's Law right due to a need to somehow design and keep operational the non-native orchestration of both the distributed process execution and all the dataflow, that it is dependent on.
hardware-based true-[PARALLEL]
systems are at the highest level of orchestration, where both the silicon ( like the InMOS' network of meshed Transputers ) and also the programming language ( like the InMOS' occam
or occam-pi
) provide the carefully engineered, conceptually crafted true-[PARALLEL]
code-execution.
- MIMT: Multiple Instruction Multiple Threads, a non-restricted thread-execution fabric / policy, where any thread may and does issue a different instruction to the processor for execution, as opposed to SIMT