cpucpu-architectureriscvaluriscv32

RISCV: how the branch intstructions are calculated?


I am trying to understand how modern CPU works. I am focused on RISC-V. there are a few types of branches:

I use a venus simulator to test this and also I am trying to simulate it as well and so far so good it works, but I cannot understand, how are branches calculated. From what I have read, the ALU unit has just one signal output - ZERO (apart from its math output) which is active whenever the output is zero. But just how can I determine if the branch should be taken or not based just on the ZERO output? And how are they calculated?

Example code:

addi t0, zero, 9
addi t1, zero, 10
blt t0, t1, end
end:

Example of branches:

BEQ - subtract 2 numbers, if ZERO is active, branch

BNE - subtract 2 numbers, if ZERO is not active, branch

BLT - and here I am a little bit confused; should I subtract and then look at the sign bit, or what?

BGE / BGEU - and how to differentiate these? What math instructions should I use?


Solution

  • Yes, the ZERO output gives you equal / not-equal. You can also use XOR instead of SUB for equality comparisons if that runs faster (ready earlier in a partial clock cycle) and/or uses less power (fewer transistors switching).

    Fun fact: MIPS only has eq / ne and signed-compare-against-zero branch conditions, all of which can be tested fast without carry propagation or any other cascading bits. That mattered because it checked branch conditions in the first half cycle of exec, in time to forward to fetch, keeping branch latency down to 1 cycle which the branch-delay slot hid on classic MIPS pipelines. For other conditions, like blt between two registers, you need slt and branch on that. RISC-V has true hardware instructions for blt between two registers, vs. MIPS's bltz against zero only.


    Why use an ALU with only a zero output? That makes it unusable for comparisons other than exact equality.

    You need other outputs to determine GT / GE / LE / LT (and their unsigned equivalents) from a subtract result.

    For unsigned conditions, all you need is zero and a carry/borrow (unsigned overflow) flag.


    The sign bit of the result on its own is not sufficient for signed conditions because signed overflow is possible: (-1) - (-2) = +1 : -1 > -2 (signbit clear) but (8-bit wraparound) 0x80 - 0x7F = +1 (signbit also clear) but -128 < 127. The sign bit of a number on its own is only useful if comparing against zero.

    If you widen the result (by sign-extending the inputs and doing one more bit of add/sub) that makes signed overflow impossible so that 33rd bit is a signed-less-than result directly.

    You can also get a signed-less-than result from signed_overflow XOR signbit instead of actually widening + adding. You might also want an ALU output for signed overflow, if RISC-V has any architectural way for software to check for signed-integer overflow.

    Signed-overflow can be computed by looking at the carry in and carry out from the MSB (the sign bit). If those differ, you have overflow. i.e. SF = XOR of those two carries. See also http://teaching.idallen.com/dat2343/10f/notes/040_overflow.txt for a detailed look at unsigned carry vs. signed overflow with 2-bit and 4-bit examples.


    In CPUs with a FLAGS register (e.g. x86 and ARM), those ALU outputs actually go into a special register with named bits. You can look at an x86 manual for conditional-jump instructions to see how condition names like l (signed less-than) or b (unsigned below) map to those flags:

    signed conditions:

    If you decide to have your ALU just produce a "signed-less-than" output instead of separate SF and OF outputs, that's fine. SF==OF is just !(SF != OF).

    (x86 also has some mnemonic synonyms for the same opcode, like jl = jnge. There are "only" 16 FLAGS predicates, including OF=0 alone (test for overflow, not a compare result), and the parity flag. You only care about the actual signed/unsigned compare conditions.)

    If you think through some example cases, like testing that INT_MAX > INT_MIN you'll see why these conditions make sense, like that example I showed above for 8-bit numbers.

    unsigned:

    (Note that x86 subtract sets CF = borrow output, so 1 - 2 sets CF=1. Some other ISAs (e.g. ARM) invert the carry flag for subtract. When implementing RISC-V this will all be internal to the CPU, not architecturally visible to software.)

    I don't know if RISC-V actually has all of these different branch conditions, but x86 does.

    There might be simpler ways to implement a signed or unsigned comparator than doing subtraction at all.

    But if you already have an add/subtract ALU and want to piggyback on that then you might just want it to generate Carry and Signed-less-than outputs as well as Zero.

    That way you don't need a separate sign-flag output, or to grab the MSB of the integer result. It's just one extra XOR gate inside the ALU to combine those two things.