c++moveshared-ptroverhead-minimization

overhead for moving std::shared_ptr?


Here is a C++ snippet. Func1 generates a shared object, which is directly moved into Func2. We think that there should not be overhead in Func3. Putting this snippet into Compiler Explorer, we see a 2-3 times shorter code with MSVC compared to clang or GCC. Why is that, and can one obtain the shorter code with clang/GCC?

It looks like Func3 generates exception handling code for cleaning up the temporary shared object.

#include <memory>

std::shared_ptr<double> Func1();
void Func2 (std::shared_ptr<double> s);

void Func3()
{
  Func2(Func1());
}

Solution

  • The problem boils down to platform ABI, and is better illustrated by a completely opaque type:

    struct A {
        A(const A&);
        A(A&&);
        ~A();
    };
    
    A make() noexcept;
    void take(A) noexcept;
    
    void foo() {
        take(make());
    }
    

    See comparison at Compiler Explorer

    MSVC Output

    void foo(void) PROC
            push    ecx
            push    ecx
            push    esp
            call    A make(void)
            add     esp, 4
            call    void take(A)
            add     esp, 8
            ret     0
    void foo(void) ENDP
    

    GCC Output (clang is very similar)

    foo():
            sub     rsp, 24
            lea     rdi, [rsp+15]
            call    make()
            lea     rdi, [rsp+15]
            call    take(A)
            lea     rdi, [rsp+15]
            call    A::~A() [complete object destructor]
            add     rsp, 24
            ret
    

    If the type has a non-trivial destructor, the caller calls that destructor after control returns to it (including when the caller throws an exception).

    - Itanium C++ ABI §3.1.2.3 Non-Trivial Parameters

    Explanation

    What takes place here is:

    MSVC instead destroys the temporary A (or in your case, std::shared_ptr) inside the callee, not at the call site. The extra code you're seeing is an inlined version of the std::shared_ptr destructor.

    In the end, you shouldn't see any major performance impact as a result. However, if Func2 resets/releases the shared pointer, then most of the destructor code at the call site is dead, unfortunately. This ABI problem is similar to an issue with std::unique_ptr:

    There is also a language issue surrounding the order of destruction of function parameters and the execution of unique_ptr's destructor. For simplicity that is being ignored in this paper, but a complete solution to "unique_ptr is as cheap to pass a T*" would have to address that as well.


    See Also

    Agner Fog. - Calling conventions for different C++ compilers and operating systems