c++performanceloopsbit-manipulation

Are bitwise operators slower than common loops like a for loop?


Do bitwise operators like & (AND) take more runtime than common for loops?

Today I was doing a power of 2 question in LeetCode. My code was this:

if (n > 0 && (n & (n - 1)) == 0) {
    return true;
}
return false;

It gave me a runtime of 5ms. I tried resubmitting to see if there was any variance in runtime but it was the same.

So then I wrote my code using a for loop:

for (int i = 0; i <= 30; i++){
    int ans = pow(2, i);
    if (n == ans){
        return true;
    }
}
return false;

The runtime became 0 ms.

I was hoping for a faster runtime with my bitwise operator code but instead I got the slowest one.


Solution

  • All of the common binary arithmetic and logical operations with the sole exception of divide take about the same time (often a single cycle) on modern computer architectures. Division can be 10-20x slower.

    The problem here is with Leetcode's dodgy benchmarking. The second method is doing way more work! pow() has a significant execution time and is called 31x. There is no way that it can possibly win here.

    The second loop could be rewritten to avoid the pow function call quite easily but even then it would still be doing around 20x more work than the first method.

    bool better_forloop(const int n)
    {
       int ans = 1;
       for (int i = 0; i <= 30; i++)
       {
          if (n == ans) return true;
          ans *= 2;
       }
       return false;
    }
    

    @TedLyngmo makes a very valid point if statements and for loops each have an implicit additional cost for the conditional branch so that if you can compute a true/false result and return it then that is always better.

    I altered the code to return an int and into classic C so that it would be minimal code on Godbolt. The results are interesting. GCC 14.1 made by far the best fist of it. Intel ICX less so and MSVC risible even at -O3.

    This is the Godbolt assembler output from GCC 14.1 which was the densest and most efficient code generation I found. First method 8 instructions for any n, second method 11 + n*(10+time for pow()) ~120 (min) and a maximum ~3000 (when n=30).

    bitwise:
        xor     eax, eax
        test    edi, edi
        jle     .L1
        lea     eax, [rdi-1]
        test    eax, edi
        sete    al
        movzx   eax, al
    .L1:
        ret
    forloop:
        push    rbp
        mov     ebp, edi
        push    rbx
        xor     ebx, ebx
        sub     rsp, 8
        jmp     .L7
    .L12:
        add     ebx, 1
        cmp     ebx, 31
        je      .L11
    .L7:
        pxor    xmm1, xmm1
        movsd   xmm0, QWORD PTR .LC0[rip]
        cvtsi2sd        xmm1, ebx
        call    pow
        cvttsd2si       eax, xmm0
        cmp     eax, ebp
        jne     .L12
        add     rsp, 8
        mov     eax, 1
        pop     rbx
        pop     rbp
        ret
    .L11:
        add     rsp, 8
        xor     eax, eax
        pop     rbx
        pop     rbp
        ret
    .LC0:
        .long   0
        .long   1073741824
    

    A quick and dirty guestimate is that each instruction takes about 1 cycle on average except for call pow which is about 100 machine cycles. Pow has to compute log 2, multiply by i and then compute its exponential which takes ages.