Given a chain of lambdas where each one captures the previous one by value:
auto l1 = [](int a, int b) { std::cout << a << ' ' << b << '\n'; };
auto l2 = [=](int a, int b) { std::cout << a << '-' << b << '\n'; l1(a, b); };
auto l3 = [=](int a, int b) { std::cout << a << '#' << b << '\n'; l2(a, b); };
auto l4 = [=](int a, int b) { std::cout << a << '%' << b << '\n'; l3(a, b); };
std::cout << sizeof(l4);
We can observe, that the resulting sizeof
of l4
is equal to 1
.
That makes sense to me. We are capturing lambdas by value and each of those objects has to have sizeof
equal to 1
, but since they are stateless, an optimization similar to [[no_unique_address]]
one applies (especially since they all have unique types).
However, when I try to create a generic builder for chaining comparators, this optimization no longer takes place:
template <typename Comparator>
auto comparing_by(Comparator&& comparator) {
return comparator;
}
template <typename Comparator, typename... Comparators>
auto comparing_by(Comparator&& comparator, Comparators&&... remaining_comparators) {
return [=](auto left, auto right) {
auto const less = comparator(left, right);
auto const greater = comparator(right, left);
if (!less && !greater) {
return comparing_by(remaining_comparators...)(left, right);
}
return less;
};
}
struct triple {
int x, y, z;
};
auto main() -> int {
auto by_x = [](triple left, triple right) { return left.x < right.x; };
auto by_y = [](triple left, triple right) { return left.y < right.y; };
auto by_z = [](triple left, triple right) { return left.z < right.z; };
auto comparator = comparing_by(by_x, by_z, by_y);
std::cout << sizeof(comparator);
}
Note 1: I am aware of the fact that comparing_by
is inefficient and sometimes calls the comparator in a redundant fashion.
Why in the above case the resulting sizeof
of comparator
is equal to 3
and not to 1
? It is still stateless, after all. Where am I wrong? Or is it just a missed optimization in all of the big three compilers?
Note 2: This is purely an academic question. I am not trying to solve any particular problem.
What's happening in the first example is not what you think it is. Let's say l1
has type L1
, l2
L2
, etc. These are the members of those types:
struct L1 {
// empty;
};
sizeof(L1) == 1
struct L2 {
L1 l1;
};
sizeof(L2) == sizeof(L1) // 1
struct L3 {
L2 l2;
};
sizeof(L3) == sizeof(L2) // 1
struct L4 {
L3 l3;
};
sizeof(L4) == sizeof(L3) // 1
And in your next example, you capture all the lambdas by value, so the closure type has three non-overlapping members, so the size will be at least 3.
[[no_unique_address]]
can't be generically applied to the data members of a closure type (consider a empty class that puts its address in a global map).
The compiler could use empty base optimisation for a "well behaved type" (a trivilly-copyable empty type maybe?), so this might be a missed optimisation. The standard says this about what can be done ([expr.prim.lambda.closure]p2):
The closure type is not an aggregate type. An implementation may define the closure type differently from what is described below provided this does not alter the observable behavior of the program other than by changing:
- the size and/or alignment of the closure type,
- whether the closure type is trivially copyable ([class.prop]), or
- whether the closure type is a standard-layout class ([class.prop]).
So the change in size is OK, but it would have to be done so that is_empty_v<lambda_that_captures_stateless_lambda>
is not true
(since that's an observable behaviour)
To "manually" apply this optimisation, you can, instead of calling the lambda comparator(left, right)
, default construct something of the type of the closure type and call that (decltype(comparator){}(left, right)
). I've implemented that here: https://godbolt.org/z/73M1Gd3o5