I have a memory location that contains a character that I want to compare with another character (and it's not at the top of the stack so I can't just pop
it). How do I reference the contents of a memory location so I can compare it?
Basically how do I do it syntactically.
And of course Intel and AMD's manuals have whole sections on the details of the encodings of ModRM (and optional SIB and disp8/disp32 bytes), which makes it clear what's encodeable and why limits exist.
See also: table of AT&T(GNU) syntax vs. NASM syntax for different addressing modes, including indirect jumps / calls. Also see the collection of links at the bottom of this answer.
x86 (32 and 64bit) has several addressing modes to choose from. They're all of the form:
[base_reg + index_reg*scale + displacement] ; or a subset of this
[RIP + displacement] ; or RIP-relative: 64bit only. No index reg is allowed
(where scale is 1, 2, 4, or 8, and displacement is a signed 32-bit constant). All the other forms (except RIP-relative) are subsets of this that leave out one or more component. This means you don't need a zeroed index_reg
to access [rsi]
for example.
In asm source code, it doesn't matter what order you write things: [5 + rax + rsp + 15*4 + MY_ASSEMBLER_MACRO*2]
works fine. (All the math on constants happens at assemble time, resulting in a single constant displacement.)
The registers all have to be the same size as each other. And the same size as the mode you're in unless you use an alternate address-size, requiring an extra prefix byte. Narrow pointers are rarely useful outside of the x32 ABI (ILP32 in long mode) where you might want to ignore the top 32 bits of a register, e.g. instead of using movsxd
to sign-extend a 32-bit possibly-negative offset in a register to 64-bit pointer width.
If you want to use al
as an array index, for example, you need to zero- or sign-extend it to pointer width. (Having the upper bits of rax
already zeroed before messing around with byte registers is sometimes possible, and is a good way to accomplish this.)
The limitations reflect what's encodeable in machine-code, as usual for assembly language. The scale factor is a 2-bit shift count. The ModRM (and optional SIB) bytes can encode up to 2 registers but not more, and don't have any modes that subtract registers, only add. Any register can be a base. Any register except ESP/RSP can be an index. See rbp not allowed as SIB base? for the encoding details, like why [rsp]
always needs a SIB byte.
Every possible subset of the general case is encodable, except ones using e/rsp*scale
(obviously useless in "normal" code that always keeps a pointer to stack memory in esp
).
Normally, the code-size of the encodings is:
[-128 to +127]
can use the more compact disp8
encoding, saving 3 bytes vs. disp32
.ModRM is always present, and its bits signal whether a SIB is also present. Similar for disp8/disp32. Code-size exceptions:
[reg*scale]
by itself can only be encoded with a 32-bit displacement (which can of course be zero). Smart assemblers work around that by encoding lea eax, [rdx*2]
as lea eax, [rdx + rdx]
but that trick only works for scaling by 2. Either way a SIB byte is required, in addition to ModRM.
It's impossible to encode e/rbp
or r13
as the base register without a displacement byte, so [ebp]
is encoded as [ebp + byte 0]
. The no-displacement encodings with ebp
as a base register instead mean there's no base register (e.g. for [disp + reg*scale]
).
[e/rsp]
requires a SIB byte even if there's no index register. (whether or not there's a displacement). The mod/rm encoding that would specify [rsp]
instead means that there's a SIB byte.
See Table 2-5 in Intel's ref manual, and the surrounding section, for the details on the special cases. (They're the same in 32 and 64bit mode. Adding RIP-relative encoding didn't conflict with any other encoding, even without a REX prefix.)
For performance, it's typically not worth it to spend an extra instruction just to get smaller x86 machine code. On Intel CPUs with a uop cache, it's smaller than L1 I$, and a more precious resource. Minimizing fused-domain uops is typically more important.
(This question was tagged MASM, but some of this answer talks about NASM's version of Intel syntax, especially where they differ for x86-64 RIP-relative addressing. AT&T syntax is not covered, but keep in mind that's just another syntax for the same machine code so the limitations are the same.)
This table doesn't exactly match the hardware encodings of possible addressing modes, since I'm distinguishing between using a label (for e.g. global or static data) vs. using a small constant displacement. So I'm covering hardware addressing modes + linker support for symbols.
(Note: usually you'd want movzx eax, byte [esi]
or movsx
when the source is a byte, but mov al, byte_src
does assemble and is common in old code, merging into the low byte of EAX/RAX. See Why doesn't GCC use partial registers? and How to isolate byte and word array elements in a 64-bit register)
If you have an int*
, often you'd use the scale factor to scale an index by the array element size if you have an element index instead of a byte offset. (Prefer byte offsets or pointers to avoid indexed addressing modes for code-size reasons, and performance in some cases especially on Intel CPUs where it can hurt micro-fusion). But you can also do other things.
If you have a pointer char array*
in esi
:
mov al, esi
: invalid, won't assemble. Without square brackets, it's not a load at all. It's an error because the registers aren't the same size.
mov al, [esi]
loads the byte pointed to, i.e. array[0]
or *array
.
mov al, [esi + ecx]
loads array[ecx]
.
mov al, [esi + 10]
loads array[10]
.
mov al, [esi + ecx*8 + 200]
loads array[ecx*8 + 200]
mov al, [global_array + 10]
loads from global_array[10]
. In 64-bit mode, this can and should be a RIP-relative address. Using NASM DEFAULT REL
is recommended, to generate RIP-relative addresses by default instead of having to always use [rel global_array + 10]
. MASM does this by default I think. There is no way to use an index register with a RIP-relative address directly. The normal method is lea rax, [global_array]
mov al, [rax + rcx*8 + 10]
or similar.
See How do RIP-relative variable references like "[RIP + _a]" in x86-64 GAS Intel-syntax work? for more details, and syntax for GAS .intel_syntax
, NASM, and GAS AT&T syntax.
mov al, [global_array + ecx + edx*2 + 10]
loads from global_array[ecx + edx*2 + 10]
Obviously you can index a static/global array with a single register. Even a 2D array using two separate registers is possible. (pre-scaling one with an extra instruction, for scale factors other than 2, 4, or 8). Note that the global_array + 10
math is done at link time. The object file (assembler output, linker input) informs the linker of the +10 to add to the final absolute address, to put the right 4-byte displacement into the executable (linker output). This is why you can't use arbitrary expressions on link-time constants that aren't assemble-time constants (e.g. symbol addresses).
In 64-bit mode, this still needs the global_array
as a 32-bit absolute address for the disp32
part, which only works in a position-dependent Linux executable, or largeaddressaware=no Windows.
mov al, 0ABh
Not a load at all, but instead an immediate-constant that was stored inside the instruction. (Note that you need to prefix a 0
so the assembler knows it's a constant, not a symbol. Some assemblers will also accept 0xAB
, and some of those won't accept 0ABh
: see more).
You can use a symbol as the immediate constant, to get an address into a register:
mov esi, global_array
assembles into a mov esi, imm32
that puts the address into esi.mov esi, OFFSET global_array
is needed to do the same thing.mov esi, global_array
assembles into a load: mov esi, dword [global_array]
.In 64-bit mode, the standard way to put a symbol address into a register is a RIP-relative LEA. Syntax varies by assembler. MASM does it by default. NASM needs a default rel
directive, or [rel global_array]
. GAS needs it explicitly in every addressing mode. How to load address of function or label into register. mov r64, imm64
is usually supported too, for 64-bit absolute addressing, but is normally the slowest option (code size creates front-end bottlenecks). So mov rdi, format_string
/ call printf
typically works in NASM, but is not efficient.
As an optimization when addresses can be represented as a 32-bit absolute (instead of as a rel32 offset from the current position), mov reg, imm32
is still optimal just like in 32-bit code. (Linux non-PIE executable or Windows with LargeAddressAware=no). But note that in 32-bit mode, lea eax, [array]
is not efficient: it wastes a byte of code-size (ModRM + absolute disp32) and can't run on as many execution ports as mov eax, imm32
. 32-bit mode doesn't have RIP-relative addressing.
Note that OS X loads all code at an address outside the low 32 bits, so 32-bit absolute addressing is unusable. Position-independent code isn't required for executables, but you might as well because 64-bit absolute addressing is less efficient than RIP-relative. The macho64 object file format doesn't support relocations for 32-bit absolute addresses the way Linux ELF does. Make sure not to use a label name as a compile-time 32-bit constant anywhere. An effective-address like [global_array + constant]
is fine because that can be assembled to a RIP-relative addressing mode. But [global_array + rcx]
is not allowed because RIP can't be used with any other registers, so it would have to be assembled with the absolute address of global_array
hard-coded as the 32bit displacement (which will be sign-extended to 64b).
Any and all of these addressing modes can be used with LEA
to do integer math with a bonus of not affecting flags, regardless of whether it's a valid address. Using LEA on values that aren't addresses / pointers?
[esi*4 + 10]
is usually only useful with LEA (unless the displacement is a symbol, instead of a small constant). In machine code, there is no encoding for scaled-register alone, so [esi*4]
has to assemble to [esi*4 + 0]
, with 4 bytes of zeros for a 32-bit displacement. It's still often worth it to copy+shift in one instruction instead of a shorter mov + shl, because usually uop throughput is more of a bottleneck than code size, especially on CPUs with a decoded-uop cache.
You can specify segment-overrides like mov al, [fs:esi]
(NASM syntax). A segment-override just adds a prefix-byte in front of the usual encoding. Everything else stays the same, with the same syntax.
You can even use segment overrides with RIP-relative addressing. 32-bit absolute addressing takes one more byte to encode than RIP-relative, so mov eax, [fs:0]
can most efficiently be encoded using a relative displacement that produces a known absolute address. i.e. choose rel32 so RIP+rel32 = 0. YASM will do this with mov ecx, [fs: rel 0]
, but NASM always uses disp32 absolute addressing, ignoring the rel
specifier. I haven't tested MASM or gas.
If the operand-size is ambiguous (e.g. in an instruction with an immediate and a memory operand), use byte
/ word
/ dword
/ qword
to specify:
mov dword [rsi + 10], 123 ; NASM
mov dword ptr [rsi + 10], 123 ; MASM and GNU .intex_syntax noprefix
movl $123, 10(%rsi) # GNU(AT&T): operand size from mnemonic suffix
See the yasm docs for NASM-syntax effective addresses, and/or the wikipedia x86 entry's section on addressing modes.
The wiki page says what's allowed in 16bit mode. Here's another "cheat sheet" for 32bit addressing modes.
16-bit address size can't use a SIB byte, so all the one and two register addressing modes are encoded into the single mod/rm byte. reg1
can be BX or BP, and reg2
can be SI or DI (or you can use any of those 4 registers by themself). Scaling is not available. 16-bit code is obsolete for a lot of reasons, including this one, and not worth learning if you don't have to. (Or not until after you learn 32 or 64-bit.)
Note that the 16-bit restrictions apply any time you're using 16-bit address-size, including in 32-bit code with an address-size override prefix, so 16-bit LEA-math is highly restrictive. e.g. you can't do lea eax, [dx + cx*2]
to do math and truncate + zero-extend. However lea eax, [edx + ecx*2]
does set ax = dx + cx*2
, because garbage in the upper bits of the source registers has no effect on the low 16.
See also Differences between general purpose registers in 8086: [bx] works, [cx] doesn't? for a list of the available addressing modes.
There's also a more detailed guide to addressing modes for 16-bit. You might want to read it to understand some fundamentals about how x86 CPUs use addresses because some of that hasn't changed for 32-bit mode.
Many of these are also linked above, but not all.