If I have an address in the rbx register and use an instruction like
mov rax, [rbx+1]
Is rbx+1 computed in hardware during runtime? If so are some registers used or is there a dedicated hardware piece?
I figured doing the same instruction but with a symbol instead of a register like so
string: db "I'm lost", 0
mov rax, [string+1]
would allow the calculation to be done at compile-time since it already has a location in memory reserved. Whereas rbx would be more variable and unknown until runtime.
All CPUs, even original 8086, had some temporary buffers separate from the architectural registers. 8086 used the main ALU for address math, so add ax, [bx + si + 1]
would need to use that temporary storage; the address math doesn't affect the software-visible value in the BX or RBX register.
Old CPUs like 8086 handled even simple x86 instructions by running a sequence of internal microcode instructions. Modern CPUs decode an instruction like mov eax, [rbx+1]
to a single micro-op (uop) for a load execution unit. (They do still have buffering between pipeline stages, and even some temporary registers for instructions like xchg eax, ecx
to use; on Intel that's a 3-uop instruction that's something like mov internal_tmp, ecx
/ mov ecx, eax
/ mov eax, internal_tmp
.)
Modern CPUs have dedicated address-generation units (AGUs) as part of load and store-address execution units, separate from their ALU execution units. See https://realworldtech.com/sandy-bridge/10 for a block diagram.
Related:
[rbx+1]
value being in the same page as [rbx]
, if the rbx
value is forwarded from another load. This cuts down load-use latency for pointer-chasing (e.g. linked lists and binary trees) to 4 cycles vs. the usual 5, by letting TLB access start sooner.Note that LEA is a separate animal; its result is written to a register, not used for load or store, so modern CPUs run it on an ALU execution unit as just a shift-and-add instruction. (With some of the ALUs that support it not supporting the shift part, or only supporting one add, depending on the CPU model. See Using LEA on values that aren't addresses / pointers? - although this is true regardless of whether the integer value happens to a valid pointer or not.