I'm writing a C compiler as a hobby and would like it to be able to link against C static libraries produced by MSVC.
I read the Microsoft x64 ABI, and it doesn't seem to have a strongly mandated alignment for integer primitive types. It "recommends" aligning them with their natural size, for example an int32 would be 4-byte aligned.
But when I compile a minimal program that passes many int
s as parameters, it's clearly using 8-byte alignment for them, despite only referencing them as DWORDs.
int add_many_args(int a, int b, int c, int d, int e, int f, int g, int h) {
return a + b + c + d + e + f + g + h;
}
a$ = 8
b$ = 16
c$ = 24
d$ = 32
e$ = 40
f$ = 48
g$ = 56
h$ = 64
add_many_args PROC
mov DWORD PTR [rsp+32], r9d
mov DWORD PTR [rsp+24], r8d
mov DWORD PTR [rsp+16], edx
mov DWORD PTR [rsp+8], ecx
mov eax, DWORD PTR b$[rsp]
mov ecx, DWORD PTR a$[rsp]
add ecx, eax
mov eax, ecx
add eax, DWORD PTR c$[rsp]
add eax, DWORD PTR d$[rsp]
add eax, DWORD PTR e$[rsp]
add eax, DWORD PTR f$[rsp]
add eax, DWORD PTR g$[rsp]
add eax, DWORD PTR h$[rsp]
ret 0
add_many_args ENDP
First question is why would it do that? Why isn't it aligning them using the natural size, 4 bytes?
Second question is: as I try to write a compiler that aims to be able to link against C static libraries, how am I supposed to know what alignment the library used, so that my code can correctly pass stack parameters to library functions? I hear people say that the "C ABI is stable", so where are the rules for this written down?
Look at local vars or struct layout. The Windows x64 calling convention makes every arg take exactly 8 bytes (1 register or stack slot), so variadic functions are easy just by dumping the 4 arg-passing regs to shadow space and indexing the args as an array.
It's normal for other calling conventions to make each arg take the stack space of a register, instead of having complicated rules for foo(int a, int64_t b, double c)
to make sure the wider args are aligned.
The Windows x64 docs (https://learn.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-170#parameter-passing) don't clearly state that stack arg slots are always 8 bytes even for narrow types, but they are.
The normal reason for making stack args take a full stack slot is to allow narrow args to be written with push
, but you don't normally do that in Windows x64 because shadow space goes below them. So normally you'd sub rsp, imm8
at the top of a function and use mov
to store args, not constantly push and dealloc / realloc shadow space. I can't immediately think of a reason why packing narrow args wouldn't work, just enforcing that each one is aligned by at least alignof(T)
, but it's not a big deal. Especially since aligning RSP by 16 before a call
would often mean rounding up the space needed for stack args.
Godbolt with MSVC and GCC -O2 -mabi=ms
, and GCC -O2
targeting Linux (-mabi=sysv
being the default for Godbolt's Linux compilers.)
int foo(){
volatile int a = 1;
volatile int b = 2;
volatile int c = 3;
return a+b+c;
}
Huh, strangely MSVC chooses to put each one in a separate 8-byte slot of the shadow space its caller reserved.
; x64 MSVC 19.40 -O2
c$ = 8 ; offsets from the return address where RSP points on function entry
b$ = 16
a$ = 24
int foo(void) PROC ; foo, COMDAT
mov DWORD PTR a$[rsp], 1
mov DWORD PTR b$[rsp], 2
mov DWORD PTR c$[rsp], 3
mov ecx, DWORD PTR c$[rsp]
mov eax, DWORD PTR b$[rsp] ; apparently it doesn't want to add eax, mem with volatile?
add ecx, eax
mov eax, DWORD PTR a$[rsp]
add eax, ecx
ret 0
But GCC does what I expected:
Linux GCC 14.2 -O2 -mabi=ms
foo():
sub rsp, 24 # unfortunately fails to use its shadow space
mov DWORD PTR [rsp+4], 1
mov DWORD PTR [rsp+8], 2
mov DWORD PTR [rsp+12], 3
mov eax, DWORD PTR [rsp+4]
mov ecx, DWORD PTR [rsp+8]
mov edx, DWORD PTR [rsp+12] # volatile defeats add eax, mem
add rsp, 24
add eax, ecx
add eax, edx
ret
In a debug build with more variables, MSVC will pack them only 4 bytes apart. In an optimized build with a bunch more unused volatile
variables all =2
from copy/paste, it will store them all in the same place, [rsp+32]
!! (I put an #if 0
in the godbolt link.)
struct int3{
int a,b,c;
};
int bar(int3 st){
return st.a + st.b + st.c;
}
; x64 MSVC -O2
int bar(int3) PROC ; bar, COMDAT
mov eax, DWORD PTR [rcx+8]
add eax, DWORD PTR [rcx+4]
add eax, DWORD PTR [rcx]
ret 0
Windows x64 passes objects larger than 8 bytes by pointer to space allocated by the caller. So it's like bar(int3 &st)
except the caller needs to copy so changes made to the arg object aren't visible in the caller's copy if its value is used after the call.
Just for fun, compare the x86-64 System V calling convention which passes structs up to 16 bytes in a pair of registers. In this case, the first two integer arg-passing regs for that convention, RDI and RSI:
# x86-64 Linux GCC -O2
bar(int3):
mov rax, rdi
shr rax, 32 # st.b
add eax, edi # st.a
add eax, esi # st.c