optimizationlanguage-agnosticcpucpu-architecturebranch-prediction

Why is a CPU branch instruction slow?


Since I started programming, I have read in every place to avoid wasteful branches at all costs.

That's fine, although none of the articles explained why I should do this. What exactly happens when the CPU decodes a branch instruction and decides to do a jump? And what is the "thing" that makes it slower than other instructions (like addition)?

(editor's note: near-duplicate of Why is processing a sorted array faster than processing an unsorted array? which is a good example of easy vs. hard to predict branches leading to mispredictions, with a good answer which explains what's going on.)


Solution

  • A branch instruction is not inherently slower than any other instruction.

    However, the reason you heard that branches should avoided is because modern CPUs follow a pipeline architecture. This means that there are multiple sequential instructions being executed simultaneously. But the pipeline can only be fully utilised if it's able to read the next instruction from memory on every cycle, which in turn means it needs to know which instruction to read.

    On a conditional branch, it usually doesn't know ahead of time which path will be taken. So when this happens, the CPU has to stall until the decision has been resolved, and throws away everything in the pipeline that's behind the branch instruction. This lowers utilisation, and therefore performance.

    This is the reason that things like branch prediction and branch delay slots exist.