c++listrcppfatal-errorrcppparallel

Why does Rcpp + RcppParallel with Rcpp::List throw me a fatal error?


I run R + Rcpp + RcppParallel from RStudio.

I run a parallel loop that works if I do not use a Rcpp::List in the worker, but throws an abort if I use the Rcpp::List in the worker (even in a trivial way).

picture of the fatal abort of R Studio

Am I not allowed to use a List in a worker?

Minimal example:

The R file

sourceCpp('minimal_example.cpp')

nn = 1e2
## the following works nicely
my_test = list_paral_flawed(nn, 2)
## the following throws a fatal error
my_test = list_paral_flawed(nn, 1)

The corresponding file minimal_example.cpp (note the minimal change between struct_A1 and struct_A2)

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
using namespace Rcpp;
using namespace arma; 
using namespace RcppParallel;

struct struct_A1 : public Worker {
  // input matrix to write to
  arma::mat &out_mat;
  
  struct_A1(arma::mat &out_mat): out_mat(out_mat) {}
  
  void operator()(std::size_t begin, std::size_t end) {
    int dim_T = out_mat.n_rows;
    for (std::size_t i = begin; i < end; i++) {
      // just defining a List with a zero vector, getting it out,
      // and putting it into the columns of the output matrix;
      // rest same as struct_A1
      Rcpp::List tmp = Rcpp::List::create(
        Named("v1") = arma::vec(dim_T, fill::zeros)
      );
      arma::vec v1 = tmp["v1"];
      for (int i1=0;i1<dim_T;++i1) out_mat(i1, i) = i1 + i + v1(i1);
    }

  }
};

struct struct_A2 : public Worker {
  // input matrix to write to
  arma::mat &out_mat;
  
  struct_A2(arma::mat &out_mat): out_mat(out_mat) {}
  
  void operator()(std::size_t begin, std::size_t end) {
    int dim_T = out_mat.n_rows;
    for (std::size_t i = begin; i < end; i++) {
      // same as in struct_A1, but now without the list, just
      // directly making a vector of zeros and putting it into
      // the columns of the output matrix
      arma::vec v1(dim_T, fill::zeros);
      for (int i1=0;i1<dim_T;++i1) out_mat(i1, i) = i1 + i + v1(i1);
    }
    
  }
};

// [[Rcpp::export]]
arma::mat list_paral_flawed(const int dim_N, const int version_x) {

  // construct the output matrix
  arma::mat out_mat(dim_N, dim_N);

  // depending on version_x, do struct_A1 or struct_A2
  // as the parallel loop
  if (version_x == 1) {
    Rcout << "\nDoing version A1\n";
    struct_A1 a2(out_mat);
    parallelFor(0, dim_N, a2);
  } else {
    Rcout << "\nDoing version A2\n";
    struct_A2 a2(out_mat);
    parallelFor(0, dim_N, a2);
  }

  return out_mat;
}

Now, list_paral_flawed(100, 2) works perfectly fine, whereas list_paral_flawed(100, 1) throws an abort on my machine.

NB: putting the number of workers in the R code to 1 using RcppParallel::setThreadOptions(numThreads = 1) avoids the error, which makes me suspect it is some type of memory allocation issue ...

Any help much appreciated


Solution

  • It may be as simple as having skipped the recommendation

    The code that you write within parallel workers should not call the R or Rcpp API in any fashion.

    from section 'Thread Safety' of the (generally excellent) RcppParallel documentation site. 'Writing R Extensions' has similar recommendations. All this boils down to 'cannot use any data structure / container with R memory' as we cannot know when gc() gets called and we get back into single-threaded R.

    So in short, to be on the safe side you need have data in non-R data structures to permit parallel work on that data. That is what the examples in RcppParallel do. They are worth studying in detail.