I want to test a bit in a register, namely the second lowest one in %rdi
.
Naively I would write test $2, %edi
(or and $2, %edi
-- I don't know if and
ing would be better -- the rest of the register is irrelevant at this point).
I checked what clang and gcc generate (for a dummy void TEST(long X){ if(X&2) abort(); }
), and while they seem similarly split on test
vs and
, they both surprised me by both agreeing to address the register via %dil
, not %edi
.
What might be the reason for this?
Both ways have equal performance except for code-size; reading a low-8-bit partial register never has any penalty, unlike test $2, %bh
for example; reading high-8 registers has extra latency on Haswell and later but still saves code-size and doesn't hurt front-end throughput.
There is no test $sign_extended_imm8, r/m32
, so it saves code-size to use 8-bit operand-size, even though it requires a REX prefix to encode DIL. (https://www.felixcloutier.com/x86/test)
Since the value of x
isn't needed after the test, you actually could use and $imm8, %edi
(3 bytes) to save code-size, but and
/jnz
can't macro-fuse on AMD CPUs, or Intel before Sandybridge, so compilers prefer to only write FLAGS. I suspect nobody's implement the peephole optimization of using and
instead of test
with -mtune=sandybridge
when the register isn't needed later.
0: f7 c7 02 00 00 00 test edi,0x2 # imm32 = 2
6: 40 f6 c7 02 test dil,0x2 # REX prefix with no bits set
a: f6 c7 02 test bh,0x2 # same byte without REX
d: 83 e7 02 and edi,0x2
10: 40 80 e7 02 and dil,0x2
14: 80 e7 02 and bh,0x2
17: 0f ba e7 02 bt edi,0x2 # can't macro-fuse with JCC