I would like to perform a transformation column-wise on an Armadillo matrix, using std::for_each or a similar method from the algorithm header. The reason for this is that the actual transformation I will perform is heavy, and hence likely to benefit from parallelization. In addition, I need to process the matrix one column at a time when running this in parallel, since the transformation function depends on all the values in a given colum, but not the values in other columns.
Below is a workaround that does work, in this small example for a 4 x 3 matrix. Rather than using an arma::mat
of dimension 4 x 3, I define a std::vector
of size 3, whose elements are arma::vec
of size 4. Hence I implicitly have a 4 x 3 matrix.
#include <armadillo>
#include <iostream>
#include <algorithm>
int main()
{
std::vector<arma::vec> v{arma::randu(4), arma::randu(4), arma::randu(4)};
for(auto x : v) {
std::cout << x << std::endl;
}
std::for_each(v.begin(), v.end(), [](arma::vec& x){ x = arma::ones(4); });
for(auto x : v) {
std::cout << x << std::endl;
}
return 0;
}
This does work, but as I'm using linear algebra functions in other places of my code, it would be more convenient to replace the following line:
std::vector<arma::vec> v{arma::randu(4), arma::randu(4), arma::randu(4)};
with something like this
arma::mat v{4, 3, arma::randu};
and then following by something like
// does not work! Gives a double, and iterates over all elements.
std::for_each(v.begin(), v.end(), [](arma::vec& x) {x = foo(); });
However, in this case I don't know how to define the iterator given to std::for_each
so that it works one column at a time, and in addition gives an arma::vec
. The iterators described in the Armadillo documentation seem to give me the start of a given column, but then iterate one element at a time.
I am also aware of the for_each
member functions provided in Armadillo, which would work sequentially, but the goal here is to perform parallel processing by eventually using the overload of std::for_each
taking an execution policy as its first argument.
The number of rows is available in arma::mat.n_rows
and you get iterators to each row with v.begin_row(row_index)
and v.end_row(row_index)
.
If you want to use the parallel execution policy on the outer vector in v
, you could do something like this (which just prints the values):
#include <armadillo>
#include <boost/iterator/counting_iterator.hpp>
//
#include <algorithm>
#include <execution>
#include <iostream>
int main() {
arma::mat v{4, 3, arma::fill::randu};
// create a pair of counting iterator to iterate through
// the indices [0, v.n_rows) :
boost::counting_iterator<std::size_t> beg(0);
boost::counting_iterator<std::size_t> end(v.n_rows);
// for_each with an execution policy on the outer part of v:
std::for_each(std::execution::par, beg, end, [&](std::size_t idx) {
std::for_each(v.begin_row(idx), v.end_row(idx),
[](auto&& colval) {
std::cout << colval << ' ';
});
std::cout << '\n';
});
}
If you don't want to use boost, you can create your own counting iterator - or fill a std::vector<std::size_t>
with the indices you'd like to iterate over. Filling that takes a little bit of time so if you need to do it often, use a counting iterator.
In C++23 you could also use the normal begin()
and end()
iterators of the arma::mat
and chunk it up in v.n_cols
sized sub-ranges using std::views::chunk
:
#include <armadillo>
//
#include <algorithm>
#include <execution>
#include <iostream>
#include <ranges>
int main() {
arma::mat v{4, 3, arma::fill::randu};
auto cv = std::views::chunk(v, v.n_cols);
std::for_each(std::execution::par, cv.begin(), cv.end(),
[](auto&& subrange) {
for(auto colval : subrange) {
std::cout << colval << ' ';
}
std::cout << '\n';
});
}
If you instead want to use the parallel execution policies on the inner vector in v
it's much simpler:
for(std::size_t idx = 0; idx < v.n_rows; ++idx) {
// execution policy on the inner part of v:
std::for_each(std::execution::par, v.begin_row(idx), v.end_row(idx), ...);
}