I run R + Rcpp + RcppParallel from RStudio.
I run a parallel loop that works if I do not use a Rcpp::List in the worker, but throws an abort if I use the Rcpp::List in the worker (even in a trivial way).
Am I not allowed to use a List in a worker?
Minimal example:
The R file
sourceCpp('minimal_example.cpp')
nn = 1e2
## the following works nicely
my_test = list_paral_flawed(nn, 2)
## the following throws a fatal error
my_test = list_paral_flawed(nn, 1)
The corresponding file minimal_example.cpp
(note the minimal change between struct_A1 and struct_A2)
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
using namespace Rcpp;
using namespace arma;
using namespace RcppParallel;
struct struct_A1 : public Worker {
// input matrix to write to
arma::mat &out_mat;
struct_A1(arma::mat &out_mat): out_mat(out_mat) {}
void operator()(std::size_t begin, std::size_t end) {
int dim_T = out_mat.n_rows;
for (std::size_t i = begin; i < end; i++) {
// just defining a List with a zero vector, getting it out,
// and putting it into the columns of the output matrix;
// rest same as struct_A1
Rcpp::List tmp = Rcpp::List::create(
Named("v1") = arma::vec(dim_T, fill::zeros)
);
arma::vec v1 = tmp["v1"];
for (int i1=0;i1<dim_T;++i1) out_mat(i1, i) = i1 + i + v1(i1);
}
}
};
struct struct_A2 : public Worker {
// input matrix to write to
arma::mat &out_mat;
struct_A2(arma::mat &out_mat): out_mat(out_mat) {}
void operator()(std::size_t begin, std::size_t end) {
int dim_T = out_mat.n_rows;
for (std::size_t i = begin; i < end; i++) {
// same as in struct_A1, but now without the list, just
// directly making a vector of zeros and putting it into
// the columns of the output matrix
arma::vec v1(dim_T, fill::zeros);
for (int i1=0;i1<dim_T;++i1) out_mat(i1, i) = i1 + i + v1(i1);
}
}
};
// [[Rcpp::export]]
arma::mat list_paral_flawed(const int dim_N, const int version_x) {
// construct the output matrix
arma::mat out_mat(dim_N, dim_N);
// depending on version_x, do struct_A1 or struct_A2
// as the parallel loop
if (version_x == 1) {
Rcout << "\nDoing version A1\n";
struct_A1 a2(out_mat);
parallelFor(0, dim_N, a2);
} else {
Rcout << "\nDoing version A2\n";
struct_A2 a2(out_mat);
parallelFor(0, dim_N, a2);
}
return out_mat;
}
Now, list_paral_flawed(100, 2)
works perfectly fine, whereas list_paral_flawed(100, 1)
throws an abort on my machine.
NB: putting the number of workers in the R code to 1 using RcppParallel::setThreadOptions(numThreads = 1)
avoids the error, which makes me suspect it is some type of memory allocation issue ...
Any help much appreciated
It may be as simple as having skipped the recommendation
The code that you write within parallel workers should not call the R or Rcpp API in any fashion.
from section 'Thread Safety' of the (generally excellent) RcppParallel documentation site. 'Writing R Extensions' has similar recommendations. All this boils down to 'cannot use any data structure / container with R memory' as we cannot know when gc()
gets called and we get back into single-threaded R.
So in short, to be on the safe side you need have data in non-R data structures to permit parallel work on that data. That is what the examples in RcppParallel do. They are worth studying in detail.