c++inlininglto

C++ small function not inlining


I have a hot and critical path function (about 45% of cycles:ppp as per perf record) in my C++17 application that is not being inlined as I would expect. It's a tiny function -- it simply returns the value of an atomic pointer member. The disassembly confirms that the function is just four assembly instructions, including the retq. Furthermore, there is only a single caller of this function in the entire build. I've even declared this function as __attribute__((always_inline)). Yet, there's a call and return to this function being generated.

The caller is in file A and the callee is in file B.

Some additional notes:

Actually, I've simplified a bit -- there are actually two places where this lack of inlining is happening in my application. File B has a function F1, which calls File A's F2, which calls File B's F3 (F2 and F3 are the ones listed above).

File A:

F2() {
  F3();
}

File B:

F1() {
  F2();
}

F3() {}

How can I get all of these to inline into one function? Another more fundamental question: can a function defined in a different file be inlined (perhaps using LTO)?


Solution

  • PS

    The always_inline attribute probably does not mean what you think it means. Normally g++ does not inline anything when there are no optimizations turned on (as this makes debugging easier, I assume). By adding this attribute (always_inline) the compiler will inline when not optimizing (probably not what you want) but this does not make a function that was not inline(able) into one that can or will be inline(ed).

    see: https://gcc.gnu.org/onlinedocs/gcc/Inline.html

    Given your comments you have the following:

    File A.h

    void F2();
    

    File B.h

    void F1();
    void F3() __attribute__((always_inline));
    

    File A.cpp

    #include "A.h"
    #include "B.h"
    
    void F2() {
      F3();
    }
    

    File B.cpp

    #include "B.h"
    #include "A.h"
    
    void F1() {
      F2();
    }
    
    void F3() {}
    

    In the future that would be the minimal viable applications that you should have submitted as it has all the type information and enough to re-build your situation.

    The code you provide is not compilable and takes a lot of cognitive load to unwind the english description you provided into compilable code.

    If you have set up your compiler this can be done so that F3() will be inlined into A.cpp but that may not always be the case. To be able to do that kind of optimization either the translation unit must have access to the source of F3() or you must be able to cross translation unit optimizations.

    You can simplify this by moving the body of F3() into the header file. Then it will be available for inlining directly to the translation unit.

    File A.h

    void F2();
    

    File B.h

    void F1();
    void F3() __attribute__((always_inline)); // I would not add this.
                                              // Let the compiler not inline in debug mode.
    inline void F3() {}
    

    File A.cpp

    #include "A.h"
    #include "B.h"
    
    void F2() {
      F3();
    }
    

    File B.cpp

    #include "B.h"
    #include "A.h"
    
    void F1() {
      F2();
    }