[SOLVED] Assembly registers in 64-bit architecture

With the old names all registers remain the same size, just like when x86-16 was extended to x86-32. To access 64-bit integer registers you use the new names with R-prefix such as rax, rbx...

Register names don't change so you just use the byte registers (al, bl, cl, dl, ah, bh, ch, dh) for the LSB and MSB of ax, bx, cx, dx like before. (How do AX, AH, AL map onto EAX?)

There are also 8 new registers called r8-r15. You can access their LSBs by adding the suffix b if you're using AMD or l if you're using Intel, though Intel too is moving towards using b. For example r8b, r9b, r10l, r11l... You can also use the LSB of esi, edi, esp, ebp by the names sil, dil, spl, bpl with the new REX prefix, but you cannot use it at the same time with ah, bh, ch or dh.

Likewise the new registers' lowest word or double word can be accessed through the suffix w or d. Writing a 32-bit register zero-extends into the full 64-bit register, unlike writing low-8, high-8, or low-16 partial registers where the 8086 / 386 semantics still apply.

Update: Intel has just introduced a new extension for x86-64 called APX which adds 16 more registers named r16-r31

So the list of general-purpose registers is like this:

64-bit register	Lower 32 bits	Lower 16 bits	Lower 8 bits
rax	eax	ax	al
rbx	ebx	bx	bl
rcx	ecx	cx	cl
rdx	edx	dx	dl
rsi	esi	si	sil
rdi	edi	di	dil
rbp	ebp	bp	bpl
rsp	esp	sp	spl
r8	r8d	r8w	r8b (r8l)
r9	r9d	r9w	r9b (r9l)
r10	r10d	r10w	r10b (r10l)
r11	r11d	r11w	r11b (r11l)
r12	r12d	r12w	r12b (r12l)
r13	r13d	r13w	r13b (r13l)
r14	r14d	r14w	r14b (r14l)
r15	r15d	r15w	r15b (r15l)
r16 (with APX)	r16d	r16w	r16b (r16l)
r17 (with APX)	r17d	r17w	r17b (r17l)
...	...	...	...
r31 (with APX)	r31d	r31w	r31b (r31l)

Of course there are also other types of registers like control, debug, flag, floating-point, vector, segment, test... registers. For more details check https://wiki.osdev.org/CPU_Registers_x86. See also What are the names of the new X86_64 processors registers?

Calling conventions

Regarding the calling convention, on each specific system there's only one convention¹. Follow the links for details on which integer and vector registers are call-clobbered vs. call-preserved, and additional details like 16-byte alignment of RSP before a call, and quirks for variadic functions.

On Windows:
- RCX, RDX, R8, R9 for the first four arguments if they're integer or pointer
- XMM0, XMM1, XMM2, XMM3 for floating-point arguments
- The caller reserves 32 bytes of "shadow space" above the return address. (If there are any stack args, they go above the shadow space.)
¹Since MSVC 2013 there's also a new extended convention on Windows called __vectorcall so the "single convention policy" is not true anymore.
On Linux and other systems that follow the System V AMD64 ABI, more arguments can be passed in registers and there's a 128-byte red zone below the stack which may make leaf functions faster.
- The first six integer or pointer arguments are passed in registers RDI, RSI, RDX, RCX, R8, and R9
- Floating-point arguments are passed in XMM0 through XMM7

For more information should read x86-64 and x86-64 calling conventions

There's also a convention used in Plan 9 where

All registers are caller-saved

All parameters are passed on the stack

Return values are also returned on the stack, in space reserved below (stack-wise; higher addresses on amd64) the arguments.

Golang follows the Plan 9 calling convention, but since go 1.17+ they're gradually introducing a register-based calling convention for better performance. The calling convention can change in the future, and the compiler can generate stubs to automatically call assembly functions in older conventions. At the moment the ABI specifies that

9 general-purpose registers will be used to pass integer arguments: RAX, RBX, RCX, RDI, RSI, R8, R9, R10, R11
15 registers XMM0-XMM14 are used for floating-point arguments

_{In fact Plan 9 was always a weirdo. For example it forces a register to be 0 on RISC architectures without a hardware zero register. x86 register names on it are also consistent across 16, 32 and 64-bit x86 architectures with operand size indicated by mnemonic suffix. That means ax can be a 16, 32 or 64-bit register depending on the instruction suffix. If you're curious about it read}

OTOH Itanium is a completely different architecture and has no relation to x86-64 whatsoever. It's a pure 64-bit architecture so all normal registers are 64-bit, no 32-bit or smaller version is available. There are a lot of registers in it:

128 general-purpose integer registers r0 through r127, each carrying 64 value bits and a trap bit. We'll learn more about the trap bit later.

128 floating point registers f0 through f127.

64 predicate registers p0 through p63.

8 branch registers b0 through b7.

An instruction pointer, which the Windows debugging engine for some reason calls iip. (The extra "i" is for "insane"?)

128 special-purpose registers, not all of which have been given meanings. These are called "application registers" (ar) for some reason. I will cover selected register as they arise during the discussion.

Other miscellaneous registers we will not cover in this series.

The Itanium processor, part 1: Warming up