I wanted to see what executes faster when doing an even-odd check, modulus or bitwise comparisons. However, I'm not sure that the following is behaving correctly, as the difference is so small. I read somewhere online that bitwise should be an order of magnitude faster than modulus checking.
Is it possible that it's getting optimized away? I've just started tinkering with assembly, otherwise I'd attempt to dissect the executable a bit.
Here is a working test:
#include <stdio.h>
#include <time.h>
#include <stdint.h>
// to reset the global
static const int SEED = 0x2A;
// 5B iterations, each
static const int64_t LOOPS = 5000000000;
int64_t globalVar;
// gotta call something
int64_t doSomething( int64_t input )
{
return 1 + input;
}
int main(int argc, char *argv[])
{
globalVar = SEED;
// mod
clock_t startMod = clock();
for( int64_t i=0; i<LOOPS; ++i )
{
if( ( i % globalVar ) == 0 )
{
globalVar = doSomething(globalVar);
}
}
clock_t endMod = clock();
double modTime = (double)(endMod - startMod) / CLOCKS_PER_SEC;
globalVar = SEED;
// bit
clock_t startBit = clock();
for( int64_t j=0; j<LOOPS; ++j )
{
if( ( j & globalVar ) == 0 )
{
globalVar = doSomething(globalVar);
}
}
clock_t endBit = clock();
double bitTime = (double)(endBit - startBit) / CLOCKS_PER_SEC;
printf("Mod: %lf\n", modTime);
printf("Bit: %lf\n", bitTime);
printf("Dif: %lf\n", ( modTime > bitTime ? modTime-bitTime : bitTime-modTime ));
}
5 billion iterations of each loop, with a global removing compiler optimization yields the following:
Mod: 93.099101
Bit: 16.701401
Dif: 76.397700
Most compilers will compile both of the following to EXACTLY the same machine instruction(s):
if( ( i % 2 ) == 0 )
if( ( i & 1 ) == 0 )
...even without ANY "optimization" turned on. The reason is that you are MOD-ing and AND-ing with constant values, and a %2 operation is, as any compiler writer should know, functionally equivalent to an &1 operation. In fact, MOD by any power-of-2 has an equivalent AND operation. If you really want to test the difference, you'll need to make the right-hand-side of both operations be variable, and to be absolutely sure the compiler's cleverness isn't thwarting your efforts, you'll need to bury the variables' initializations somewhere that the compiler can't tell at that point what its runtime value will be; i.e. you'll need to pass the values into a GLOBALLY-DECLARED (i.e. not 'static') test function as parameters, in which case the compiler can't trace back to their definition & substitute the variables with constants, because theoretically any external caller could pass any values in for those parameters. Alternatively, you could leave the code in main() and define the variables globally, in which case the compiler can't substitute them with constants because it can't know for sure that another function may have altered the value of the global variables.
Incidentally, this same issue exists for divide operations.... Divisions by constant powers-of-two can be substituted with an equivalent right-shift (>>) operation. The same trick works for multiplication (<<), but the benefits are less (or nonexistant) for multiplications. True division operations just take a long time in hardware, though significant improvements have been made in most modern processors vs. even 15 years ago, division operations still take maybe 80 clock cycles, while a >> operation takes only a single cycle. You're not going to see an "order of magnitude" improvement using bitwise tricks on modern processors, but most compilers will still use those tricks because there is still some noticeable improvement.
EDIT: On some embedded processors (and, unbelievable though it was, the original Sparc desktop/workstation processor versions before v8), there isn't even a divide instruction at all. All true divide & mod operations on such processors must be performed entirely in software, which can be a monstrously expensive operation. In that sort of environment, you surely would see an order of magnitude difference.