[SOLVED] Why does this function push RAX to the stack as the first operation?

Why does this function push RAX to the stack as the first operation?

In the assembly of the C++ source below. Why is RAX pushed to the stack?

RAX, as I understand it from the ABI could contain anything from the calling function. But we save it here, and then later move the stack back by 8 bytes. So the RAX on the stack is, I think only relevant for the std::__throw_bad_function_call() operation ... ?

The code:-

#include <functional> 

void f(std::function<void()> a) 
{
  a(); 
}

Output, from gcc.godbolt.org, using Clang 3.7.1 -O3:

f(std::function<void ()>):                  # @f(std::function<void ()>)
        push    rax
        cmp     qword ptr [rdi + 16], 0
        je      .LBB0_1
        add     rsp, 8
        jmp     qword ptr [rdi + 24]    # TAILCALL
.LBB0_1:
        call    std::__throw_bad_function_call()

I'm sure the reason is obvious, but I'm struggling to figure it out.

Here's a tailcall without the std::function<void()> wrapper for comparison:

void g(void(*a)())
{
  a(); 
}

The trivial:

g(void (*)()):             # @g(void (*)())
        jmp     rdi        # TAILCALL

Solution

The 64-bit ABI requires that the stack is aligned to 16 bytes before a call instruction.

call pushes an 8-byte return address on the stack, which breaks the alignment, so the compiler needs to do something to align the stack again to a multiple of 16 before the next call.

(The ABI design choice of requiring alignment before a call instead of after has the minor advantage that if any args were passed on the stack, this choice makes the first arg 16B-aligned.)

Pushing a don't-care value works well, and can be more efficient than sub rsp, 8 on CPUs with a stack engine. (See the comments).