c++armadillo

Can I use std::for_each to transform an arma::mat one column at a time?


I would like to perform a transformation column-wise on an Armadillo matrix, using std::for_each or a similar method from the algorithm header. The reason for this is that the actual transformation I will perform is heavy, and hence likely to benefit from parallelization. In addition, I need to process the matrix one column at a time when running this in parallel, since the transformation function depends on all the values in a given colum, but not the values in other columns.

Below is a workaround that does work, in this small example for a 4 x 3 matrix. Rather than using an arma::mat of dimension 4 x 3, I define a std::vector of size 3, whose elements are arma::vec of size 4. Hence I implicitly have a 4 x 3 matrix.

#include <armadillo>
#include <iostream>
#include <algorithm>

int main()
{
  std::vector<arma::vec> v{arma::randu(4), arma::randu(4), arma::randu(4)};

  for(auto x : v) {
    std::cout << x << std::endl;
  }
  
  std::for_each(v.begin(), v.end(), [](arma::vec& x){ x = arma::ones(4); });

  for(auto x : v) {
    std::cout << x << std::endl;
  }
  
  return 0;
}

This does work, but as I'm using linear algebra functions in other places of my code, it would be more convenient to replace the following line:

std::vector<arma::vec> v{arma::randu(4), arma::randu(4), arma::randu(4)};

with something like this

arma::mat v{4, 3, arma::randu};

and then following by something like

// does not work! Gives a double, and iterates over all elements.
std::for_each(v.begin(), v.end(), [](arma::vec& x) {x = foo(); });

However, in this case I don't know how to define the iterator given to std::for_each so that it works one column at a time, and in addition gives an arma::vec. The iterators described in the Armadillo documentation seem to give me the start of a given column, but then iterate one element at a time.

I am also aware of the for_each member functions provided in Armadillo, which would work sequentially, but the goal here is to perform parallel processing by eventually using the overload of std::for_each taking an execution policy as its first argument.


Solution

  • The number of rows is available in arma::mat.n_rows and you get iterators to each row with v.begin_row(row_index) and v.end_row(row_index).

    If you want to use the parallel execution policy on the outer vector in v, you could do something like this (which just prints the values):

    #include <armadillo>
    #include <boost/iterator/counting_iterator.hpp>
    //
    #include <algorithm>
    #include <execution>
    #include <iostream>
    
    int main() {
        arma::mat v{4, 3, arma::fill::randu};
    
        // create a pair of counting iterator to iterate through
        // the indices [0, v.n_rows) :
        boost::counting_iterator<std::size_t> beg(0);
        boost::counting_iterator<std::size_t> end(v.n_rows);
    
        // for_each with an execution policy on the outer part of v:
        std::for_each(std::execution::par, beg, end, [&](std::size_t idx) {
            std::for_each(v.begin_row(idx), v.end_row(idx),
                          [](auto&& colval) {
                              std::cout << colval << ' '; 
                          });
            std::cout << '\n';
        });
    }
    

    If you don't want to use boost, you can create your own counting iterator - or fill a std::vector<std::size_t> with the indices you'd like to iterate over. Filling that takes a little bit of time so if you need to do it often, use a counting iterator.

    In C++23 you could also use the normal begin() and end() iterators of the arma::mat and chunk it up in v.n_cols sized sub-ranges using std::views::chunk:

    #include <armadillo>
    //
    #include <algorithm>
    #include <execution>
    #include <iostream>
    #include <ranges>
    
    int main() {
        arma::mat v{4, 3, arma::fill::randu};
    
        auto cv = std::views::chunk(v, v.n_cols);
    
        std::for_each(std::execution::par, cv.begin(), cv.end(),
                      [](auto&& subrange) {
                          for(auto colval : subrange) {
                              std::cout << colval << ' ';
                          }
                          std::cout << '\n';
                      });
    }
    

    If you instead want to use the parallel execution policies on the inner vector in v it's much simpler:

    for(std::size_t idx = 0; idx < v.n_rows; ++idx) {
        // execution policy on the inner part of v:
        std::for_each(std::execution::par, v.begin_row(idx), v.end_row(idx), ...);
    }