I'm looking to make an iterator that is very similar to std::ranges::views::iota but making it work in 3D instead. It will not have a container underneath because I don't need to save the indexes into an array, I just need to go through them.
I need to make a function like this operate with parallel std::for_each instead:
int data [n][n][n]= {};
for (int i = 0; i < n; i++)
for(int j = 0; j < n; j++)
for(int k = 0; k < n; k++)
data[i][j][k] = func(i, j, k);
I can do this right now with using a single for_each:
#define IDX1D3D(dim, i) ({ (int)((i) / ( (dim) * (dim) )), (int)( ((i) / (dim) ) % (dim) ), (int)((i) % (dim)) })
int data [n * n * n]= {};
auto range = std::ranges::views::iota((size_t) 0, (size_t) (n * n * n));
std::for_each(
std::execution::par_unseq,
range.begin(),
range.end(),
[&data](const size_t &index) {
const std::array<int, 3> index3D = IDX1D3D(n, index);
data[index] = func(index3D[0], index3D[1], index[2]);
});
However, this is costly because to convert a 1D index representing a 3D position you need to do 4 divisions. I want to skip having to do these divisions.
I have already defined a class and their iterator:
#include <iostream>
class Index3D {
public:
union {
struct {
int x{};
int y{};
int z{};
} pos;
int n{};
} attr;
Index3D() = default;
explicit Index3D(int x, int y, int z)
{
attr.pos.x = x;
attr.pos.y = y;
attr.pos.z = z;
}
explicit Index3D(int n){
attr.n = n;
}
class Iterator {
int ix, iy, iz, n;
public:
Iterator(int x, int y, int z, int n) : ix(x), iy(y), iz(z), n(n) {}
Index3D operator*() const {
return Index3D{ix, iy, iz};
}
Iterator& operator++() {
++iz;
if (iz >= n) {
iz = 0;
++iy;
if (iy >= n) {
iy = 0;
++ix;
}
}
return *this;
}
bool operator!=(const Iterator& other) const {
return iz != other.iz || iy != other.iy || ix != other.ix;
}
};
Iterator begin() const {
return {0, 0, 0, attr.n};
}
Iterator end() const {
return {attr.n, 0, 0, attr.n};
}
};
int main() {
int n = 3;
Index3D index3D(n);
std::for_each(
index3D.begin(),
index3D.end(),
[](auto index){
std::cout << "Index: (" << index.attr.pos.x << ", " << index.attr.pos.y << ", " << index.attr.pos.z << ")\n";
}
);
return 0;
}
And as you can see, it works well enough in this case. But I'm not providing the execution policy so this runs sequentially instead of in parallel, which I would want for my tasks.
When I insert the execution policy, it fails to compile:
std::for_each(
std::execution::par_unseq,
index3D.begin(),
index3D.end(),
[](auto index){
std::cout << "Index: (" << index.attr.pos.x << ", " << index.attr.pos.y << ", " << index.attr.pos.z << ")\n";
}
);
Here are the errors I get from the compiler:
error: no type named 'iterator_category' in 'struct std::iterator_traits<Index3D::Iterator>'
error: no matching function for call to '__is_vectorization_preferred<const __pstl::execution::v1::parallel_policy&, Index3D::Iterator>(const __pstl::execution::v1::parallel_policy&)'
error: no matching function for call to '__is_parallelization_preferred<const __pstl::execution::v1::parallel_policy&, Index3D::Iterator>(const __pstl::execution::v1::parallel_policy&)'
error: no type named 'type' in 'struct __pstl::__internal::__is_random_access_iterator<Index3D::Iterator>'
I am considering just using 3 for_eaches inside of each other and using the std::ranges::views::iota for all three of them and simply 'passing' the current 'higher' index using the capture clauses:
auto range = std::ranges::views::iota(0, n);
std::for_each(
std::execution::par_unseq,
range.begin(),
range.end(),
[&range](const int & indexI){
std::for_each(
std::execution::par_unseq,
range.begin(),
range.end(),
[&range, &indexI](const int & indexJ){
std::for_each(
std::execution::par_unseq,
range.begin(),
range.end(),
[&range, &indexI, &indexJ](const int & indexK){
std::cout << "Index: (" << indexI << ", " << indexJ << ", " << indexK << ")\n";
}
);
}
);
}
);
This works but seems too clunky to be reasonable. I have no idea how much the overhead would be for something like this but it can't be negligible.
one simple solution would be std::views::cartesian_product
:
constexpr auto N = /**/;
array<array<array<int, N>, N>, N> arr;
auto constexpr idx = std::views::iota (0,size(arr));
auto constexpr idx_3d=std::views::cartesian_product(idx, idx, idx);
std::for_each(
std::execution::par_unseq,
std::ranges::begin(idx_3d),
std::ranges::end(idx_3d),
[&arr](auto const ijk){
auto& [i, j, k] = ijk;
use(arr[i][j][k], i, j, k);
});
But this is only good if N
is a compile time constant. From a performance perspective, two integer divisions on index are not a major issue; because they don't break the pipe line and won't cause cache-misses. It can eventually becmore benefical by removing extra branching that confuses hardware branch predictor:
std::vector<int> arr(n * n * n);
auto zip_n = arr | std::views::enumerate;
std::for_each(
std::execution::par_unseq,
std::ranges::begin(zip_n),
std::ranges::end(zip_n),
[n](auto ijk_x){
auto& [ijk, x] = ijk_x;
auto const k = std::size_t(ijk) % n;
auto const ij= std::size_t(ijk) / n;
auto const j = ij % n;
auto const i = ij / n;
use(x, i, j, k);
});
Notice that despite the looks, it only triggers two divisions per iteration. The reason is that each division opcode computes both remainder and quotient. If either one is not used, it is discarded. In this snippet we use both computation results instantly; so just 2 divisions take place.
The only problem is a cast from index to std::size_t
; it's a design issue with std::ranges::enumerate_view
, who may result in integral types larger than std::size_t
. You can avoid it by simply using views::zip
instead:
std::vector<int> arr(n * n * n);
auto idx = std::views::iota(0ull, size(arr));
auto zip_n = std::views::zip(idx, arr);
===================== Or If C++20 is the limit:
auto idx = std::views::iota(0ull, size(arr));
//The function object to use in loop:
[&arr, n](auto const ijk){
auto const k = ijk % n;
auto const ij= ijk / n;
auto const j = ij % n;
auto const i = ij / n;
use(arr[ijk], i, j, k);
});
=====================
Tips:
notice that C++ std doesn't support VLA, and for large arrays, heap is the better choice. That's why my 2 last snippets use std:vector
(automatic heap array) and the first uses std::array
(simple lightweight wrapper around C array).
In C++ we tend to avoid preprocessor as much as possible. Using macros is not recommended. For the use case in OP, inline
functions or just lambdas are far better choices:
constexpr auto break_idx(auto ijk, auto n){
auto const k = ijk % n;
auto const ij= ijk / n;
auto const j = ij % n;
auto const i = ij / n;
return std::tuple{i, j, k};
};
//Use with structured binding:
auto const[i, j, k] = break_index(ijk, n);