Here is a C++ snippet. Func1
generates a shared object, which is directly moved into Func2
. We think that there should not be overhead in Func3
. Putting this snippet into Compiler Explorer, we see a 2-3 times shorter code with MSVC compared to clang or GCC. Why is that, and can one obtain the shorter code with clang/GCC?
It looks like Func3
generates exception handling code for cleaning up the temporary shared object.
#include <memory>
std::shared_ptr<double> Func1();
void Func2 (std::shared_ptr<double> s);
void Func3()
{
Func2(Func1());
}
The problem boils down to platform ABI, and is better illustrated by a completely opaque type:
struct A {
A(const A&);
A(A&&);
~A();
};
A make() noexcept;
void take(A) noexcept;
void foo() {
take(make());
}
See comparison at Compiler Explorer
void foo(void) PROC
push ecx
push ecx
push esp
call A make(void)
add esp, 4
call void take(A)
add esp, 8
ret 0
void foo(void) ENDP
foo():
sub rsp, 24
lea rdi, [rsp+15]
call make()
lea rdi, [rsp+15]
call take(A)
lea rdi, [rsp+15]
call A::~A() [complete object destructor]
add rsp, 24
ret
If the type has a non-trivial destructor, the caller calls that destructor after control returns to it (including when the caller throws an exception).
- Itanium C++ ABI §3.1.2.3 Non-Trivial Parameters
What takes place here is:
make()
yields a prvalue of type A
take(A)
A
at the call siteMSVC instead destroys the temporary A
(or in your case, std::shared_ptr
) inside the callee, not at the call site. The extra code you're seeing is an inlined version of the std::shared_ptr
destructor.
In the end, you shouldn't see any major performance impact as a result. However, if Func2
resets/releases the shared pointer, then most of the destructor code at the call site is dead, unfortunately. This ABI problem is similar to an issue with std::unique_ptr
:
There is also a language issue surrounding the order of destruction of function parameters and the execution of
unique_ptr
's destructor. For simplicity that is being ignored in this paper, but a complete solution to "unique_ptr
is as cheap to pass aT*
" would have to address that as well.
Agner Fog. - Calling conventions for different C++ compilers and operating systems