With C++11, the STL has now a std::iota
function (see a reference). In contrast to std::fill_n
, std::generate_n
, there is no std::iota_n
, however. What would be a good implementation for that? A direct loop (alternative 1) or delegation to std::generate_n
with a simple lambda expression (alternative 2)?
Alternative 1)
template<class OutputIterator, class Size, class T>
OutputIterator iota_n(OutputIterator first, Size n, T value)
{
while (n--)
*first++ = value++;
return first;
}
Alternative 2)
template<class OutputIterator, class Size, class T>
OutputIterator iota_n(OutputIterator first, Size n, T value)
{
return std::generate_n(first, n, [&](){ return value++; });
}
Would both alternatives generate equivalent code with optimizing compilers?
UPDATE: incorporated the excellent point of @Marc Mutz to also return the iterator at its destination point. This is also how std::generate_n
got updated in C++11 compared to C++98.
As a random example, I compiled the following code with g++ -S -O2 -masm=intel
(GCC 4.7.1, x86_32):
void fill_it_up(int n, int * p, int val)
{
asm volatile("DEBUG1");
iota_n(p, n, val);
asm volatile("DEBUG2");
iota_m(p, n, val);
asm volatile("DEBUG3");
for (int i = 0; i != n; ++i) { *p++ = val++; }
asm volatile("DEBUG4");
}
Here iota_n
is the first version and iota_m
the second. The assembly is in all three cases this:
test edi, edi
jle .L4
mov edx, eax
neg edx
lea ebx, [esi+edx*4]
mov edx, eax
lea ebp, [edi+eax]
.p2align 4,,7
.p2align 3
.L9:
lea ecx, [edx+1]
cmp ecx, ebp
mov DWORD PTR [ebx-4+ecx*4], edx
mov edx, ecx
jne .L9
With -O3
, the three versions are also very similar, but a lot longer (using conditional moves and punpcklqdq
and such like).