c++optimizationpipeliningbranchless

Is it the case that comparisons imply a branch?


I was reading the Wikibooks page on optimization and I came across the line:

For pipelined processors, comparisons are slower than differences, because they imply a branch.

Why is it the case that comparisons imply a branch? For example if:

int i = 2;
int x = i<5;

Is there a branch in this? It makes sense to me to branch for if statements with conditionals but I don't understand why a comparison alone causes a branch.


Solution

  • Preamble: Modern compilers are capable of eliminating branches in various ways. Thus, none of the examples necessarily results a branch in the final (assembler or machine) code.

    So why does the logic basically imply branches?

    The code

    bool check_interval_branch(int const i, int const min_i, int const max_i)
    {
      return min_i <= i && i <= max_i;
    } 
    

    can be logically rewritten to be:

    bool check_interval_branch(int const i, int const min_i, int const max_i)
    {
      if (min_i <= i) 
      { 
        if (i < max_i) return true; 
      }
      return false;
    } 
    

    Here you obviously have two branches (where the second one is only executed if the first one is true - short circuit) which can be mispredicted by the branch predictor which in turn leads to a reroll of the pipeline.

    Visual Studio 2013 (with optimization turned one) generates the following assembly containing two branches for check_interval_branch:

      push  ebp
      mov   ebp, esp
      mov   eax, DWORD PTR _i$[ebp]
      cmp   DWORD PTR _min_i$[ebp], eax    // comparison
      jg    SHORT $LN3@check_inte          // conditional jump
      cmp   eax, DWORD PTR _max_i$[ebp]    // comparison
      jg    SHORT $LN3@check_inte          // conditional jump
      mov   al, 1
      pop   ebp
      ret   0
    $LN3@check_inte:
      xor   al, al
      pop   ebp
      ret   0
    

    The code

    bool check_interval_diff(int const i, int const min_i, int const max_i)
    {
      return unsigned(i - min_i) <= unsigned(max_i - min_i);
    }
    

    is logically identical to

    bool check_interval_diff(int const i, int const min_i, int const max_i)
    {
      if (unsigned(i – min_i) <= unsigned(max_i – min_i)) { return true; }
      return false;
    }
    

    which contains only a single branch but executes two differences.

    The generated code for check_interval_diff of Visual Studio 2013 doesn't even contain a conditional jump.

      push  ebp
      mov   ebp, esp
      mov   edx, DWORD PTR _i$[ebp]
      mov   eax, DWORD PTR _max_i$[ebp]
      sub   eax, DWORD PTR _min_i$[ebp]
      sub   edx, DWORD PTR _min_i$[ebp]
      cmp   eax, edx                    // comparison
      sbb   eax, eax
      inc   eax
      pop   ebp
      ret   0
    

    (The trick here is that the subtraction done by sbb is different by 1 according to the carry flag which in turn is set to 1 or 0 by the cmp instruction.)

    In fact you see three differences here (2x sub, 1x sbb).

    It probably depends on your data / the use case which one is faster.

    See Mysticals answer here about branch prediction.

    Your code int x = i<5; is logically identical to

    int x = false;
    if (i < 5)
    {
      x = true;
    }
    

    which again contains a branch (x = true only executed if i < 5.)