ccompiler-optimizationshort-circuiting

Does the C compiler have to short-circuit && and || unnecessarily?


C short-circuits && and ||.

Unlike the bitwise | operator, the || operator guarantees left-to-right evaluation; if the second operand is evaluated, there is a sequence point between the evaluations of the first and second operands. If the first operand compares unequal to 0, the second operand is not evaluated.

We often use this in idioms like

bool xyz = ptr && *ptr == 42;

But suppose we have very simple operands to the || operator:

int a[3] = {1, 2, 3};
[ ... other code ... ]

int myFunc() {
  int x = a[0] == 1 && a[1] == 2 && a[2] == 3;
}

It seems that if C must guarantee the latter operands are not evaluated, it has to generate something like (forgive my freehanding of poorly-remembered assembly pseudocode):

  xor   eax, eax
  cmp   a+0x0, 0x1 ; a[0] != 1
  jne   end
  cmp   a+0x4, 0x2 ; a[1] != 2
  jne   end
  cmp   a+0x8, 0x3 ; a[2] != 3
  jne   end
  mov   al, 0x1    ; all equal, set 1
  
.end:
  ret

That seems like a lot of branches! My general intuition is that unnecessary branches are bad, because we hit branch prediction issues. (In the past, I have replaced branches with statements like a && b, thinking it would remove the branch! Granted, I think it also makes the code clearer in most cases.)

Here's what I'd expect the non-branching version would look like, and my intuition says we'd prefer this in most cases. (again, forgive the freehand x86):

  xor   eax, eax
  xor   ecx, ecx
  cmp   a+0x0, 0x1 ; a[0] == 1
  sete  al
  cmp   a+0x4, 0x2 ; a[1] == 2
  sete  cl
  and   al, cl
  cmp   a+0x8, 0x3 ; a[2] == 3
  sete  cl
  and   al, cl
  ret

This feels like it is equivalent and removes the branches. (Though maybe loading a page into cache is "side effects" enough?)

I suspect I could encourage the compiler to generate this assembly by making my && into &? However, in general, my experience is that I should trust the compiler to optimize things better than I can.

Related:


Solution

  • In practice, should this affect how I use & and && in parts of my code?

    No it should not. You should write you code in the clearest and most humanly readable form. Microoptimizations like this should only be undertaken if and when profiling of the code has revealed a performance issue.

    Is the compiler required to generate "short-circuit" machine code if there are "no" side effects?

    The compiler is required to generate code that has the same visible effects as short-circuit code in the source. As long as it has enough information to know that is the case, it can generate any code it wants.


    Some current machines will treat short forward conditional branches like this as essentially "conditional skip" instructions -- that is, they'll execute straight through the whole block of code without executing ANY branches (or consuming branch prediction resources, which are precious) and just supress the instructions that should be skipped by the branches. To take advantage of this, you might see code generated where each 'jne' branches to the next 'jne' instruction -- so it would appear to be quite suboptimal, but when executed turns out to be very fast.