If we look at a few modern calling conventions, like x86-64 SysV style or AArch64 style (document aapcs64.pdf titled "Procedure Call Standard for the Arm® 64-bit Architecture"), we see explicit notes that variadic arguments are passed in the same way as other arguments. For example, a function call open(path, mode, cflags)
on x86-64 will get path in RDI, mode in RSI and (the only variadic one) cflags in RDX.
There is no question with passing static argument set in registers, it is good for resource saving. But if we look into a function that then interprets arguments and so calls va_start
for them, we will see that va_start
is converted into putting all possible arguments (typically, much more than present really) onto stack; for example, full emulation of printf
via vfprintf
starts with (I compacted similar rows to avoid too long listings):
my_printf:
endbr64
; nearly unconditional saving
subq $216, %rsp
movq %rsi, 40(%rsp)
<...>
movq %r9, 72(%rsp)
testb %al, %al
je .L2
movaps %xmm0, 80(%rsp)
<...>
movaps %xmm7, 192(%rsp)
; repacking into registers for enclosed vfprintf
.L2:
movq %fs:40, %rax
movq %rax, 24(%rsp)
xorl %eax, %eax
movl $8, (%rsp)
movl $48, 4(%rsp)
leaq 224(%rsp), %rax
movq %rax, 8(%rsp)
leaq 32(%rsp), %rax
movq %rax, 16(%rsp)
movq %rsp, %rcx
movq %rdi, %rdx
movl $1, %esi
; finally, call the function
movq stdout(%rip), %rdi
call __vfprintf_chk@PLT
... skipped epilogue
Here 192 bytes of VA frame. Similarly, AArch64 version pushes 184 bytes (x1..x7 and q0..q7).
If the variadic tail of any function call had been always put on stack, things would have got much simpler in code and cheaper in runtime, because all packing and copying had not been needed. va_start
would have been reduced to a single move of variadic list starting location (in stack) to a variable. This is how it really worked with i386 (where all arguments were passed on stack). Assembly output of the same trivial wrapper for Linux/i386:
my_printf:
pushl %ebx
subl $8, %esp
call __x86.get_pc_thunk.bx
addl $_GLOBAL_OFFSET_TABLE_, %ebx
leal 20(%esp), %eax ; <--- This is va_start
pushl %eax ; VA pointer pushed for vfprintf
pushl 20(%esp)
pushl $1
movl stdout@GOT(%ebx), %eax
pushl (%eax)
call __vfprintf_chk@PLT
Here, the question: why variadic arguments implementation, at least for x86-64 and aarch64, is that complicated and resource wasting?
(I could imagine that there were cases when two styles, both with fixed arguments and with a variadic list, should have been equally allowed in function declarations of the same function. But I donʼt know a case for it. The mentioned open
is unlikely the one.)
Note that not all calling conventions do so. For example, the AArch64 calling convention used on macOS passes variadic arguments on the stack.
That said, a key motivation for passing variadic arguments in registers is that this makes it so neither caller nor callee need to know if a function is variadic or not. For example, if you were to call a prototype-less function declared such:
int printf();
you wouldn't be able to know if it's a variadic function or not. But by virtue of variadic and non-variadic functions having the same calling convention, the caller can simply set AL as if it was a variadic function and call it, with the callee ignoring AL if it is not.
This is not possible with the macOS calling convention, where executing programs that don't consistently declare variadic functions with prototypes will fail.