c++performancecompiler-optimizationdivision

Do most compilers transform % 2 into bit comparison? Is it really faster?


In programming, one often needs to check if a number is odd or even. For that, we usually use:

n % 2 == 0

However, my understanding is that the '%' operator actually performs a division and returns its remainder; therefore, for the case above, it would be faster to simply check the last bit instead. Let's say n = 5;

5 = 00000101

In order to check if the number is odd or even, we just need to check the last bit. If it's 1, the number is odd; otherwise, it is even. In programming, it would be expressed like this:

n & 1 == 0

In my understanding this would be faster than % 2 as no division is performed. A mere bit comparison is needed.

I have 2 questions then:

1) Is the second way really faster than the first (in all cases)?

2) If the answer for 1 is yes, are compilers (in all languages) smart enough to convert % 2 into a simple bit comparison? Or do we have to explicitly use the second way if we want the best performance?


Solution

  • Yes, a bit-test is much faster than integer division, by about a factor of 10 to 20, or even 100 for 128bit / 64bit = 64bit idiv on Intel. Esp. since x86 at least has a test instruction that sets condition flags based on the result of a bitwise AND, so you don't have to divide and then compare; the bitwise AND is the compare.

    I decided to actually check the compiler output on Godbolt, and got a surprise:

    return n % 2 from a signed int n value instead of just testing it for non-zero (if (n % 2)) produces slower code than return n & 1. This is because (-1 % 2) == -1, while (-1 & 1) == 1, so the compiler can't just use a bitwise AND. Compilers still avoid integer division, though, and use some clever shift / and / add / sub sequence instead to get the -1 / 0 / +1 remainder, because that's still cheaper than an integer division. (gcc and clang use different sequences.)

    So if you want to return a 0/non-zero int value based on n % 2, return n%2 != 0. For 2's complement (and sign/magnitude) signed int, that's the same as n&1. Most of us only care how our code optimizes for 2's complement machines, and C++20 dropped the possibility of sign/magnitude or one's complement, making two's complement the only possibility. (To check the opposite condition, n%2 == 0. Not n%2 == 1, that would force it to check the sign bit as well for signed types, if it can't prove the signed int must be non-negative.)

    You get the same benefit from using an unsigned type, since that takes negative numbers out of the picture. Unsigned / and % by powers of 2 are just right shift and bitwise AND, unlike signed where division truncates toward 0 but arithmetic right shift (>>) rounds toward -infinity. This also lets the compiler always optimize it to a single AND instruction. (On godbolt, you can flip to other architectures, like ARM and PowerPC, and see that the unsigned even (%) function and the int even_bit (bitwise &) function have the same asm code.)

    Using a bool (which must be 0 or 1, not just any non-zero value) is another option, return n%2 as a bool is equivalent to return n%2 != 0. But the compiler will have to do extra work to return (bool) (n % 4) (or any test other than n%2) even for unsigned types. The return n & 3 version of that will be 0, 1, 2, or 3, so the compiler has to turn any non-zero value into a 1. (x86 has an efficient setcc instruction that sets a register to 0 or 1, depending on the flags, so it's still only 2 instructions instead of 1. Clang/GCC use this, see aligned4_bool in the godbolt asm output.)

    With any optimization level higher than -O0, gcc and clang optimize if (n%2) to what we expect. The other huge surprise is that icc 13 doesn't. I don't understand WTF icc thinks it's doing with all those branches.