c++lambdacompiler-optimizationinlining

Why won't compilers inline constant captures in my lambda?


Given the definition:

#include <functional>

void call();

static constexpr inline auto nWrap(std::size_t n) {
    return [n]() constexpr { for (std::size_t i = 0; i != n; ++i) call(); };
}

std::function<void()> oneCall() { return nWrap(1); }
std::function<void()> nCalls(std::size_t n) { return nWrap(n); }

I'd expect oneCall from optimizing compilers to return a functional with the loop unrolled/eliminated. Yet gcc, clang, msvc and icc all create function definitions with loops:

.L3:
        call    call()
        addq    $1, %rbx
        cmpq    0(%rbp), %rbx
        jne     .L3

In fact, no expressions involving captured constants (e.g. sum of two constants) seem to be inlined. Is there something in the definition preventing inlining? (Since all compilers seem to agree.)

How can I define nWrap so that the caller can inline captured constants? (i.e. unroll the loop iff called with a compile time constant n)

Manually moving the lambda definition into oneCall indeed eliminates the loop in the generated code:

constexpr const std::size_t n = 1;
return [n] { for (std::size_t i = 0; i != n; ++i) call(); };
std::_Function_handler<void (), oneCall()::{lambda()#1}>::_M_invoke(std::_Any_data const&):
        jmp     call()

Templating nWrap doesn't, except if I use a template parameter for n. Adding or removing various statics, inlines, and constexprs has little effect.

The production code actually wraps a caller supplied lambda (or callable) in an std::function. The caller lambda is inlined. Yet I can't find a way to eliminate the loop for the common single-call use case without defining two separate lambda functions.


Solution

  • Commentators explained that the reason why nWrap can't produce an optimized return value here is because it would change the return type of the (non-templated) function.

    With that, I found that embedding nWraps return value into another lambda definition (and thus a distinct type), will enable compilers to eliminate the loop:

    static inline constexpr auto nWrap(std::size_t n) {
        return [n]() { for (std::size_t i = 0; i != n; ++i) call(); };
    };
    
    std::function<void()> nCalls(std::size_t n) {
        if (n == 1) {
            return []{nWrap(1)();};  // Return value embedded into another lambda
        }
        return nWrap(n);
    }
    
    std::function<void()> oneCall() { return nCalls(1); }
    

    Results in oneCall always returning an optimized std::function and nCalls deciding at runtime what to return – which isn't what I asked for, but it's what I'm currently settling with.