c++optimizationvisual-c++inline

What are any real sets of rules compilers use to decide whether to inline a function?


We have a macro for signalling errors in a common utilities library that goes like this:

#define OurMacro( condition ) \
    if( condition ) { \
    } else { \
        CallExternalFunctionThatWillThrowAnException( parametersListHere ); \
    } \

What I refer to as parametersListHere is a comma-separated list of constants and macros that is populated by the compiler at each macro expansion.

That function call always resolves into a call - the function implementation is not exposed to the compiler. The function has six parameters and in debug configuration all of them have meaningful values, while in release configuration only two have meaningful values and others are passed the same default values.

Normally the condition will hold true, so I don't care how fast the invokation is, I only care about the code bloat. Calling that function with 6 parameters requires seven x86 instruction (6 pushes and one call), and clearly 4 of those pushes can be avoided if the function signature is changed to have two parameters only - this can be done by introducing an intermediate "gate" function implemented in such way its implementation is not visible to the compiler.

I need to estimate whether I should insist on that change. So far the primary improvement I expect is that reducing the number of parameters will drop 4 instructions on each invokation which means that the code surrounding the macro expansion will become smaller and the compiler will inline it more likely and optimize the emitted code further.

How can I estimate that without actually trying and recompiling all our code and carefully analyzing the emitted code? Every time I read about inline there's a statement that the compiler decides whether to inline the function.

Can I see some exact set of rules of how the function internals influence compiler decision on inlining?


Solution

  • GCC has a fairly large set of options that expose how their process works, documented here. It's of course not exact, given that it will be tweaked over time and it's CPU-dependent.

    The first rule is "their body is smaller than expected function call code". A second rule is "static functions called once".

    There are also parameters affecting the inling process, e.g. max-inline-insns-single. An insn is a pseudo-instruction in the GCC compiler, and is used here as a measure of function complexity. The documentation of parameter max-inline-insns-auto makes it clear that manually declaring a function inline might cause it to be considered for inlining even if it is too big for automatic inlining.

    Inlining isn't a all-or-nothing process, since there's a -fpartial-inlining flag.

    Of course, you can't consider inlining in isolation. Common Subexpression Elimination (CSE) makes code simpler. It's an optimization pass that may make a function small enough to be inlined. After inlining, new common subexpressions may be discovered so the CSE pass should be run again, which in turn might trigger further inlining. And CSE isn't the only optimization that needs rerunning.