Are bitwise operators slower than common loops like a for loop?

Do bitwise operators like & (AND) take more runtime than common for loops?

Today I was doing a power of 2 question in LeetCode. My code was this:

if (n > 0 && (n & (n - 1)) == 0) {
    return true;
}
return false;

It gave me a runtime of 5ms. I tried resubmitting to see if there was any variance in runtime but it was the same.

So then I wrote my code using a for loop:

for (int i = 0; i <= 30; i++){
    int ans = pow(2, i);
    if (n == ans){
        return true;
    }
}
return false;

The runtime became 0 ms.

I was hoping for a faster runtime with my bitwise operator code but instead I got the slowest one.

Solution

All of the common binary arithmetic and logical operations with the sole exception of divide take about the same time (often a single cycle) on modern computer architectures. Division can be 10-20x slower.

The problem here is with Leetcode's dodgy benchmarking. The second method is doing way more work! pow() has a significant execution time and is called 31x. There is no way that it can possibly win here.

The second loop could be rewritten to avoid the pow function call quite easily but even then it would still be doing around 20x more work than the first method.

bool better_forloop(const int n)
{
   int ans = 1;
   for (int i = 0; i <= 30; i++)
   {
      if (n == ans) return true;
      ans *= 2;
   }
   return false;
}

@TedLyngmo makes a very valid point if statements and for loops each have an implicit additional cost for the conditional branch so that if you can compute a true/false result and return it then that is always better.

I altered the code to return an int and into classic C so that it would be minimal code on Godbolt. The results are interesting. GCC 14.1 made by far the best fist of it. Intel ICX less so and MSVC risible even at -O3.

This is the Godbolt assembler output from GCC 14.1 which was the densest and most efficient code generation I found. First method 8 instructions for any n, second method 11 + n*(10+time for pow()) ~120 (min) and a maximum ~3000 (when n=30).

bitwise:
    xor     eax, eax
    test    edi, edi
    jle     .L1
    lea     eax, [rdi-1]
    test    eax, edi
    sete    al
    movzx   eax, al
.L1:
    ret
forloop:
    push    rbp
    mov     ebp, edi
    push    rbx
    xor     ebx, ebx
    sub     rsp, 8
    jmp     .L7
.L12:
    add     ebx, 1
    cmp     ebx, 31
    je      .L11
.L7:
    pxor    xmm1, xmm1
    movsd   xmm0, QWORD PTR .LC0[rip]
    cvtsi2sd        xmm1, ebx
    call    pow
    cvttsd2si       eax, xmm0
    cmp     eax, ebp
    jne     .L12
    add     rsp, 8
    mov     eax, 1
    pop     rbx
    pop     rbp
    ret
.L11:
    add     rsp, 8
    xor     eax, eax
    pop     rbx
    pop     rbp
    ret
.LC0:
    .long   0
    .long   1073741824

A quick and dirty guestimate is that each instruction takes about 1 cycle on average except for call pow which is about 100 machine cycles. Pow has to compute log 2, multiply by i and then compute its exponential which takes ages.