Given the definition:
#include <functional>
void call();
static constexpr inline auto nWrap(std::size_t n) {
return [n]() constexpr { for (std::size_t i = 0; i != n; ++i) call(); };
}
std::function<void()> oneCall() { return nWrap(1); }
std::function<void()> nCalls(std::size_t n) { return nWrap(n); }
I'd expect oneCall
from optimizing compilers to return a functional with the loop unrolled/eliminated.
Yet gcc, clang, msvc and icc all create function definitions with loops:
.L3:
call call()
addq $1, %rbx
cmpq 0(%rbp), %rbx
jne .L3
In fact, no expressions involving captured constants (e.g. sum of two constants) seem to be inlined. Is there something in the definition preventing inlining? (Since all compilers seem to agree.)
How can I define nWrap
so that the caller can inline captured constants? (i.e. unroll the loop iff called with a compile time constant n)
Manually moving the lambda definition into oneCall
indeed eliminates the loop in the generated code:
constexpr const std::size_t n = 1;
return [n] { for (std::size_t i = 0; i != n; ++i) call(); };
std::_Function_handler<void (), oneCall()::{lambda()#1}>::_M_invoke(std::_Any_data const&):
jmp call()
Templating nWrap
doesn't, except if I use a template parameter for n
. Adding or removing various static
s, inline
s, and constexpr
s has little effect.
The production code actually wraps a caller supplied lambda (or callable) in an std::function
.
The caller lambda is inlined. Yet I can't find a way to eliminate the loop for the common single-call use case without defining two separate lambda functions.
Commentators explained that the reason why nWrap
can't produce an optimized return value here is because it would change the return type of the (non-templated) function.
With that, I found that embedding nWrap
s return value into another lambda definition (and thus a distinct type), will enable compilers to eliminate the loop:
static inline constexpr auto nWrap(std::size_t n) {
return [n]() { for (std::size_t i = 0; i != n; ++i) call(); };
};
std::function<void()> nCalls(std::size_t n) {
if (n == 1) {
return []{nWrap(1)();}; // Return value embedded into another lambda
}
return nWrap(n);
}
std::function<void()> oneCall() { return nCalls(1); }
Results in oneCall
always returning an optimized std::function
and nCalls
deciding at runtime what to return – which isn't what I asked for, but it's what I'm currently settling with.