c++multithreadingparallel-processingpipelining

C++ Parallelization Without Threads?


I recently viewed this answer discussing pipelining. The question asked why a loop summing two lists to two separate variables was faster than xor-ing the same lists all to one variable. The linked answer concluded that the sums could be run in parallel, while each xor had to be computed consecutively, thus producing the seen effect.

I do not understand. Doesn't efficient parallelization require multiple threads? How can these additions be run in parallel on only one thread?

Additionally, if the compiler is so smart that it can magick in a whole new thread, why can't it just create two variables in the second function, execute the xor-s in parallel, and then xor the two variables back together after the loop terminates? To any human, such an optimization would be obvious. Is it harder to program such an optimization into the compiler than I realize?

Any explanation would be greatly appreciated!


Solution

  • CPUs are made of a pipeline. Multiple operations may do various stuff (decode instruction, evaluate, do some calculations, read/write central memory, read/write registers, ...), and all this stuff must be done one after the other for each instruction. There can be various optimizations so that this pipeline does the job in a more efficient way.

    So in fact, multiple instructions are processed at the same time by the CPU, but only one instruction is using a specific part of the pipeline. The pipeline concept also introduces various error-prone pattern, such as a read-after-write operation, but there are ways to deal with it (e.g nop instructions)

    This is nothing relative to multithreading, which is a higher level concept. Here, we are at the lower point, i.e how the CPU executes instructions. The provided link in the thread you pinned is a nice starting point (link)