For setting x to zero (x = 0), my csapp book indicates two ways.
First:
xorq %rcx, %rcx
Second:
movq $0, %rcx
It also tells that the first one takes only 3 bytes, but the second one takes 7 bytes.
How do the two ways work? Why does the first one take fewer bytes than the second one?
Because mov
needs more space to encode its 32-bit immediate source operand.
xor
only needs the ModRM byte to encode its operands.
Neither one needs a REX prefix so you should be comparing 2-byte xor %ecx,%ecx
against 5-byte mov $0, %ecx
. Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?
GAS doesn't do this optimization for you, and movq
gives you the mov $sign_extended_imm32, %r/m64
encoding instead of the special case 5-byte mov $imm32, %r32
encoding that omits the ModRM byte.
(Unless you use as -O2
in which case it will optimize the operand-size like NASM. Note that gcc -O2 -c foo.s
does not pass on optimization options to as
.)
(As noted in CS:APP example uses idivq with two operands?, CS:APP seems to be full of asm mistakes. This one isn't an invalid-syntax mistake, just a missed optimization.)
There is unfortunately no encoding of mov
with a sign-extended 8-bit immediate, otherwise we could have 3-byte mov reg, imm8
. (https://www.felixcloutier.com/x86/mov). (I'm surprised no iteration of x86-64 has repurposed one of opcode bytes it freed up for a nice mov
encoding like that, maybe lumped in with BMI1 or something.)
For more details on x86 instruction encoding, read Intel's vol.2 manual and look at disassembly, and https://wiki.osdev.org/X86-64_Instruction_Encoding is a nice overview that's less verbose than Intel's manual.
See also What is the best way to set a register to zero in x86 assembly: xor, mov or and? for more details about why xor-zeroing is optimal: on some CPUs, notably P6-family and Sandybridge-family, it has microarchitectural advantages over mov
besides simply code-size.