In the question Idiom for initializing an std::array using a generator function taking the index?,which basically asks how one can initialize an array of an arbitrary type which is not necessarily default constructible, I came up with the following (highly unorthodox) solution.
#include <cstddef>
#include <utility>
#include <array>
#include <iostream>
struct Int{
int v;
Int(int v):v{v}{}
};
int main()
{
auto gen = [](size_t i) { return Int(11*(i+1)); };
std::array<Int, 500000> arr = [&arr, &gen](){
for(std::size_t i=0; i < arr.size(); i++)
new (arr.data() + i) Int(gen(i));
return arr;
}();
for(auto i : arr) { std::cout << i.v << ' ';}
std::cout << '\n';
}
In the solution, a functor (in this case a lambda, but I'm interested in the general case) is used to initialize the array. The functor takes the array to be initialized by reference, constructs the non-default-constructible elements via placement new
, and then returns the array.
I am not entirely sure about whether this is really legitimate or not. GCC, Clang and MSVC all seems to suggest this is valid. For GCC and Clang I have also turned on the sanitizer so that undefined behaviors can be detected. The access of arr.size()
seems fine as it is just a compile-time constant. The use of arr.data()
also seems fine because the lifetime of the array arr
starts after the =
in std::array<int, 500000> arr=
and arr
has a well-defined address, which should just be what arr.data()
returns because arr
is an aggregate, but I'm not entirely sure. I'm also not sure about whether the placement new
is valid from the standard perspective. For the arr = [&arr,&gen]{...; return arr;}
I am also not sure whether the new rvalue semantics in C++17 is necessary to make the snippet valid, or whether it is also legitimate in earlier C++ standards (e.g. C++14).
So the question is, is it legitimate to access the to-be-initialized array this way in a functor that is used to initialize itself, and why?
For additional context, I have read answers to the question that inspired this question (linked above) and the suggested duplicates there. The answers there basically boils down to one of two things:
const
, but not in the case where the type is not default constructible.std::index_sequence
. This suffers from the implementation limit of templates and won't work for large array sizes.Based on these, I believe the solution here has its practical value unless someone can come up with a solution that does not suffer from the above two limitations, and this is not just a question of theoretical interest.
This is UB, and doesn't work correctly for most types.
The use of
arr.data()
also seems fine because the lifetime of the arrayarr
starts after the=
instd::array<int, 500000> arr=
No, the lifetime starts when the initialization is complete. After =
the name merely becomes visible. This makes arr.data()
UB.
Returning arr
will call a copy constructor of std::array
with itself as the argument (because of mandatory copy elision), which will do the same for every element, which is not something type authors ever expect (e.g. adding std::string x;
as a data member to Int
causes a segfault for me).
You can confirm this by adding logging to the copy constructor:
Int(const Int &x) {std::cout << this << ' ' << &x << '\n';}
For each element, this prints the same address twice.
Pre-C++17, when there was no mandatory copy elision, another possible behavior was for arr
to be copied into a temporary by return
, which is then moved back into arr
. That move uses a move constructor rather than move assignment, which overwrites existing elements without calling their destructors, which is also problematic.