MSDN attach this picture to describe how use stack in x64 fastcall call convention:
My question is: Why is the frame pointer set where the arrow is shown? After all, we originally want after call:
push rbp
mov rbp, rsp
; allocate stack memory for local vars and for register and stack parameter area
; function's work
mov rsp, rbp
pop rbp
ret
We access parameter area for B function with:
mov [rbp + 10h], rcx
mov [rbp + 18h], rdx
mov [rbp + 20h], r8
mov [rbp + 28h], r9
Local var with:
mov [rbp - 8h], 1
mov [rbp - 10h],2
; etc
And access parameter area for future functions with:
; rcx rdx r8 r8 for 1-4
mov [rsp + 20h], 1 ; fifth arg, first stack arg
etc
Is that right or I mess something up?
An example on Godbolt with MSVC19.14 -O2 appears to confirm that MSVC really does put the frame pointer there, not in a fixed location relative to the return address. So it has to rely on stack-unwind metadata without the possibility of fallback to traditional frame-pointer linked-list following.
Normally (with no command-line options) MSVC doesn't use RBP as a frame pointer, not even in debug builds at least for simple functions. Only in functions that use alloca
, or perhaps other reasons.
The x86-64 System V ABI (used on everything except Windows) does put RBP where you expect, in the traditional location right below the return address, if you use it as a traditional frame pointer at all. (Which gcc/clang do only without optimization, or if optimizing for code-size, or when alloca or over-aligning the stack.
Putting RBP closer to the middle of space you might want to access increases the amount of space you can reach with [rbp + disp8]
, which uses a sign-extended 8-bit displacement in the addressing mode. x86-64 System V has a 128-byte red-zone (below RSP), with 128 bytes chosen for that reason; access 128 bytes of local and arg space at and above RSP, and another 128 bytes below. Accessing farther away from a register needs [reg+disp32]
, a 4-byte offset, so 3 extra bytes per instruction that accesses stack space.
(It's rare to have many bytes of stack args, especially in Windows x64 which passes large structs by reference, not by value on the stack. So every arg takes exactly one 8-byte stack slot, making variadic functions simple.)