In Programming from the Ground Up, in chapter 3 I read
The general form of memory address references is this:
ADDRESS_OR_OFFSET(%BASE_OR_OFFSET, %INDEX, MULTIPLIER)
All fields are optional. To calculate the address, simply perform the following calculation:
FINAL ADDRESS = ADDRESS_OR_OFFSET + %BASE_OR_OFFSET + MULTIPLIER * %INDEX
ADDRESS_OR_OFFSET
andMULTIPLIER
must both be constants, while the other two must be registers. If one of the pieces is left out, it is just substituted with zero in the equation.
Now, I assume that substituted with zero is a typo, because if MULTIPLIER
's default was 0, then the value of %INDEX
would be irrelevant, as the product would always be zero anyway (indeed). I guess 0 is default for the other 3?
Nonetheless, what confuses me the most is that form the description above I understand that parenthesis and commas have the function of determining which parts of what we write map to the 4 "operands" of the addressing.
But then, in the following chapter I read
For example, the following code moves whatever is at the top of the stack into
%eax
:movl (%esp), %eax
If we were to just do
movl %esp, %eax
%eax
would just hold the pointer to the top of the stack rather than the value at the top.
But I don't understand why. I mean,
given the FINAL ADDRESS
expression above, I would say that
%esp
in parenthesis, it will play the role of %BASE_OR_OFFSET
, with ADDRESS_OR_OFFSET
and %INDEX
defaulting to 0 and MULTIPLIER
to 1,%esp
not in parenthesis, it will play the role of ADDRESS_OR_OFFSET
, with %BASE_OR_OFFSET
and %INDEX
defaulting to 0 and MULTIPLIER
to 1,and the sum would still be the same.
Furthermore, how is %esp
constant?
%esp
?%esp
is constant becasue is the name of a physically fixed register, then what is a non constant, in this context?Correct, the default multiplier is 1
.
movl %esp, %eax
isn't using a memory addressing-mode at all. It's a register-direct operand, so it's syntactically different from mov symbol_name, %eax
(a load from an absolute address).
There's a register but it's not inside ()
so the disp(base,idx,scale)
syntax doesn't apply.
In machine code, the ModRM byte's 2-bit "mode" field uses 0b11
to encode that it's a register operand instead of memory. (The other 3 encodings select memory with no displacement vs. disp8 vs. disp32: https://wiki.osdev.org/X86-64_Instruction_Encoding#ModR.2FM_and_SIB_bytes. And see also rbp not allowed as SIB base? for the fun special cases that allow disp32
with no registers, and to make the SIB byte optional to save machine-code size for simple addressing modes.) With ModR/M.mode = 11
, the field is just a simple register number. Similarly in assembly language, when you use a bare register name, you just get the register operand directly, not using it as an address to access memory.
(I'm not sure this is a useful analogy, but I think the useful point is that a register operand is a different thing from a memory operand even in x86 machine code. They are qualitatively different and need to be distinguished.)
Also related:
1
not 0
. The shift count is 0
in the machine code, but source-level asm syntax uses power-of-2 multipliers (in all x86 assembly syntaxes I've ever seen, including AT&T, all flavours of Intel, and Go's assembly dialect. It would of course be possible to invent a new syntax that used shift counts in the asm source.)