assembly x86-64 att instructions immediate-operand

Difference between movq and movabsq in x86-64

I'm talking about data movement instructions in the x86-64 Intel architecture. I have read that the regular movq instruction can only have immediate source operands that can be represented as 32-bit two's complement numbers, while the movabsq instruction can have an arbitrary 64-bit immediate value as its source operand and can only have a register as a destination.

Could you please elaborate on this? Does that mean I can move 64-bit immediate value using the movabsq instruction only? And only from immediate value to the register? I don't see how I can move a 64-bit immediate value to memory. Or maybe I was mistaken something important here.

Solution

Unless your 64-bit value can be encoded as a 32-bit-sign-extended immediate, you have to move it to a register first and then store. (Or do two separate 32-bit stores, or other worse workaround to get the bytes where you want them.)

In NASM / Intel syntax, mov r64, 0x... picks a MOV encoding based on the constant. There are four to choose from with immediate operands:

5 byte mov r32, imm32. (zero-extended to fill the 64-bit register like always). AT&T: mov/movl
6+ byte mov r/m32, imm32. only useful for memory destinations. AT&T: mov/movl
7+ byte mov r/m64, sign-extended-imm32. Can store 8 bytes to memory, or set a 64-bit register to a negative value. AT&T: mov/movq
10 byte mov r64, imm64. (This is the REX.W=1 version of the same no-ModRM opcode as mov r32, imm32) AT&T: movabs, or mov / movq with a wide constant.

(Byte counts only are for register destinations, or addressing modes that don't need a SIB byte or disp8/disp32: just opcode + ModR/M + imm32 like mov dword [rdi], 123)

Some Intel-syntax assemblers (but not GAS unless you use as -Os or gcc -Wa,-Os) will optimize 32-bit constants like mov rax, 1 to 5-byte mov r32, imm32 (NASM does this), while others (like YASM) will use 7-byte mov r/m64, sign-extended-imm32. They both choose the imm64 encoding only for large constants, without having to use a special mnemonic.

Or with an equ constant, YASM will sometimes use the 10-byte version even with small constants, unfortunately.

In GAS with AT&T syntax

movabsq means that the machine-code encoding will contain a 64-bit value: either an immediate constant, or an absolute memory address. (There's another group of special forms of mov that load/store al/ax/eax/rax from/to an absolute address, and the 64-bit version of that uses a 64-bit absolute address, not relative. AT&T syntax calls that movabs as well, e.g. movabs 0x123456789abc0, %eax).

Even if the number is small, like movabs $1, %rax, you still get the 10-byte version.

Some of this is mentioned in this what's new in x86-64 guide using AT&T syntax.

However, the mov mnemonic (with or without a q operand-size suffix) will pick between mov r/m64, imm32 and mov r64, imm64 depending on the size of the immediate. (See What's the difference between the x86-64 AT&T instructions movq and movabsq?, a followup which exists because the first version of this answer guessed wrong about what GAS did with large assemble-time constants for movq.)

But symbol addresses aren't known until link time, so they aren't available when the assembler is picking an encoding. At least when targeting Linux ELF object files, GAS assumes that if you didn't use movabs, you intended 32-bit absolute. (YASM does the same for mov rsi, string with a R_X86_64_32 relocation, but NASM defaults to movabs, producing a R_X86_64_64 relocation.)

If for some reason you want to use a symbol name as an absolute immediate (instead of a normally better RIP-relative LEA), you do need movabs

(On targets like Mach-O64 on OS X, movq $symbol, %rax may always pick the imm64 encoding, because 32-bit absolute addresses are never valid. There are some MacOS Q&As on SO where I think people said their code worked with movq to put a data address in a register.)

Example on Linux/ELF with a `$symbol` immediate

mov    $symbol, %rdi     # GAS assumes the address fits in 32 bits
movabs $symbol, %rdi     # GAS is forced to use an imm64


lea    symbol(%rip), %rdi  # 7 byte RIP-relative addressing, normally the best choice for position-independent code or code loaded outside the low 32 bits

mov    $symbol, %edi    # optimal in position-dependent code

Assembled with GAS into an object file (with .bss; symbol:), we get these relocations. Note the difference between R_X86_64_32S (signed) vs. R_X86_64_32 (unsigned) vs. R_X86_64_PC32 (PC-relative) 32-bit relocations.

0000000000000000 <.text>:
   0:   48 c7 c7 00 00 00 00    mov    $0x0,%rdi        3: R_X86_64_32S .bss
   7:   48 bf 00 00 00 00 00 00 00 00   movabs $0x0,%rdi        9: R_X86_64_64  .bss
  11:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 18 <.text+0x18>  14: R_X86_64_PC32       .bss-0x4
  18:   bf 00 00 00 00          mov    $0x0,%edi        19: R_X86_64_32 .bss

Linked into a non-PIE executable (gcc -no-pie -nostdlib foo.s), we get:

4000d4:       48 c7 c7 f1 00 60 00      mov    $0x6000f1,%rdi
4000db:       48 bf f1 00 60 00 00 00 00 00   movabs $0x6000f1,%rdi
4000e5:       48 8d 3d 05 00 20 00      lea    0x200005(%rip),%rdi     # 6000f1 <__bss_start>
4000ec:       bf f1 00 60 00            mov    $0x6000f1,%edi

And of course this won't link into a PIE executable, because of the 32-bit absolute relocations. movq $symbol, %rax won't work with normal gcc foo.S on modern Linux distros. 32-bit absolute addresses no longer allowed in x86-64 Linux?. (Remember, the right solution is RIP-relative LEA, or making a static executable, not actually using movabs).

movq is always the 7-byte or 10-byte form, so don't use mov $1, %rax unless you want a longer instruction for alignment purposes (instead of padding with NOPs later. What methods can be used to efficiently extend instruction length on modern x86?). Use mov $1, %eax to get the 5-byte form.

Notice that movq $0xFFFFFFFF, %rax can't use the 7-byte form, because it's not representable with a sign-extended 32-bit immediate, and needs either the imm64 encoding or the %eax destination encoding. GAS will not do this optimization for you, so you're stuck with the 10-byte encoding. You definitely want mov $0xFFFFFFFF, %eax.

movabs with an immediate source is always the imm64 form.

(movabs can also be the MOV encoding with a 64-bit absolute address and RAX as the source or dest: like REX.W + A3 MOV moffs64, RAX).

I don't see how I can move a 64-bit immediate value to memory.

That's a separate question, and the answer is: you can't. The insn ref manual entry for MOV makes this clear: the only form that has an imm64 immediate operand only has a register destination, not r/m64.

If your value fits in a sign-extended 32-bit immediate, movq $0x123456, 32(%rdi) will do an 8-byte store to memory. The limitation is that the upper 32 bits have to be copies of bit 31, because it has to be encodeable as a sign-extended-imm32.

why we can't move a 64-bit immediate value to memory? - computer architecture / ISA design reasons.
How to load address of function or label into register (use 5-byte mov r32, imm32 as an optimization, or RIP-relative LEA for any case except a large memory model where a symbol might be more than 2GiB away.)

Difference between movq and movabsq in x86-64

Example on Linux/ELF with a $symbol immediate

Example on Linux/ELF with a `$symbol` immediate