Do bitwise operators like &
(AND) take more runtime than common for
loops?
Today I was doing a power of 2 question in LeetCode. My code was this:
if (n > 0 && (n & (n - 1)) == 0) {
return true;
}
return false;
It gave me a runtime of 5ms. I tried resubmitting to see if there was any variance in runtime but it was the same.
So then I wrote my code using a for
loop:
for (int i = 0; i <= 30; i++){
int ans = pow(2, i);
if (n == ans){
return true;
}
}
return false;
The runtime became 0 ms.
I was hoping for a faster runtime with my bitwise operator code but instead I got the slowest one.
All of the common binary arithmetic and logical operations with the sole exception of divide take about the same time (often a single cycle) on modern computer architectures. Division can be 10-20x slower.
The problem here is with Leetcode's dodgy benchmarking. The second method is doing way more work! pow()
has a significant execution time and is called 31x. There is no way that it can possibly win here.
The second loop could be rewritten to avoid the pow
function call quite easily but even then it would still be doing around 20x more work than the first method.
bool better_forloop(const int n)
{
int ans = 1;
for (int i = 0; i <= 30; i++)
{
if (n == ans) return true;
ans *= 2;
}
return false;
}
@TedLyngmo makes a very valid point if statements and for loops each have an implicit additional cost for the conditional branch so that if you can compute a true/false result and return it then that is always better.
I altered the code to return an int
and into classic C so that it would be minimal code on Godbolt. The results are interesting. GCC 14.1 made by far the best fist of it. Intel ICX less so and MSVC risible even at -O3.
This is the Godbolt assembler output from GCC 14.1 which was the densest and most efficient code generation I found. First method 8 instructions for any n
, second method 11 + n*(10+time for pow())
~120 (min) and a maximum ~3000 (when n=30).
bitwise:
xor eax, eax
test edi, edi
jle .L1
lea eax, [rdi-1]
test eax, edi
sete al
movzx eax, al
.L1:
ret
forloop:
push rbp
mov ebp, edi
push rbx
xor ebx, ebx
sub rsp, 8
jmp .L7
.L12:
add ebx, 1
cmp ebx, 31
je .L11
.L7:
pxor xmm1, xmm1
movsd xmm0, QWORD PTR .LC0[rip]
cvtsi2sd xmm1, ebx
call pow
cvttsd2si eax, xmm0
cmp eax, ebp
jne .L12
add rsp, 8
mov eax, 1
pop rbx
pop rbp
ret
.L11:
add rsp, 8
xor eax, eax
pop rbx
pop rbp
ret
.LC0:
.long 0
.long 1073741824
A quick and dirty guestimate is that each instruction takes about 1 cycle on average except for call pow
which is about 100 machine cycles. Pow
has to compute log 2
, multiply by i
and then compute its exponential which takes ages.