assemblyoptimizationx86micro-optimization

Test whether a register is zero with CMP reg,0 vs OR reg,reg?


Is there any execution speed difference using the following code:

cmp al, 0
je done

and the following:

or al, al
jz done

I know that the JE and JZ instructions are the same, and also that using OR gives a size improvement of one byte. However, I am also concerned with code speed. It seems that logical operators will be faster than a SUB or a CMP, but I just wanted to make sure. This might be a trade-off between size and speed, or a win-win (of course the code will be more opaque).


Solution

  • It depends on the exact code sequence, which specific CPU it is, and other factors.

    The main problem with or al, al, is that it "modifies" EAX, which means that a subsequent instruction that uses EAX in some way may stall until this instruction completes. Note that the conditional branch (jz) also depends on the instruction, but CPU manufacturers do a lot of work (branch prediction and speculative execution) to mitigate that. Also note that in theory it would be possible for a CPU manufacturer to design a CPU that recognises EAX isn't changed in this specific case, but there are hundreds of these special cases and the benefits of recognising most of them are too little.

    The main problem with cmp al,0 is that it's slightly larger, which might mean slower instruction fetch/more cache pressure, and (if it is a loop) might mean that the code no longer fits in some CPU's "loop buffer".

    As Jester pointed out in comments; test al,al avoids both problems - it's smaller than cmp al,0 and doesn't modify EAX.

    Of course (depending on the specific sequence) the value in AL must've come from somewhere, and if it came from an instruction that set flags appropriately it might be possible to modify the code to avoid using another instruction to set flags again later.