rrcppparallel-foreach

foreach with Rcpp in R package error: <simpleError in .Call("<function_name>"..."<function name>" not available for .Call() for package "<package>">


I am trying to parallelize Rcpp code. From this post, I was able to get my MRE to run and produce the expected output by just sourcing the functions:

> Rcpp::sourceCpp("src/rnorm_c.cpp")
> source("~/<path to project folder>/rnormpar/R/normal_mat.R")
> norm_mat_par()
[[1]]
           [,1]
[1,] -0.1117342

[[2]]
           [,1]
[1,] 0.05094005

[[3]]
          [,1]
[1,] 0.1137641

[[4]]
          [,1]
[1,] 0.8624004

[[5]]
          [,1]
[1,] 0.7821107

However, after building and running the function from within the package, the output changed to:

Restarting R session...

> library(rnormpar)
> rnormpar::norm_mat_par()
[[1]]
<simpleError in .Call("_rnormpar_rnorm_n", PACKAGE = "rnormpar",     n, mu, sd): "_rnormpar_rnorm_n" not available for .Call() for package "rnormpar">

[[2]]
<simpleError in .Call("_rnormpar_rnorm_n", PACKAGE = "rnormpar",     n, mu, sd): "_rnormpar_rnorm_n" not available for .Call() for package "rnormpar">

[[3]]
<simpleError in .Call("_rnormpar_rnorm_n", PACKAGE = "rnormpar",     n, mu, sd): "_rnormpar_rnorm_n" not available for .Call() for package "rnormpar">

[[4]]
<simpleError in .Call("_rnormpar_rnorm_n", PACKAGE = "rnormpar",     n, mu, sd): "_rnormpar_rnorm_n" not available for .Call() for package "rnormpar">

[[5]]
<simpleError in .Call("_rnormpar_rnorm_n", PACKAGE = "rnormpar",     n, mu, sd): "_rnormpar_rnorm_n" not available for .Call() for package "rnormpar">

Here is the code for my MRE. It consists of two scripts. The first is the Rcpp code:

#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;

// function to generate a single sample from the standard normal distribution
//[[Rcpp::export]]
double rnorm1() {
  return (double)arma::vec(1, arma::fill::randn)(0, 0);
}

// function to return a vector of n samples from the normal distribution
//[[Rcpp::export]]
arma::vec rnorm_n(int n = 1, double mu = 0, double sd = 1){

  arma::vec res(n);

  for (int j = 0; j < n; j++){
    res(j) = rnorm1();
  }

  res = res * sd + mu;

  return res;
}

The second is the R code:

# generates a matrix distributed independent normal
# takes n, p, mean vector, and sd vector representing the diagonal of the
# covariance matrix
#' Normal matrix
#'
#' @param n sample size
#' @param p number of variables
#' @param mu mean vector
#' @param sd diagonal of the covariance matrix
#'
#' @return normal matrix
#' @export
#'
#' @examples norm_mat(1e2, 3, -1:1, 1:3)
norm_mat <- function(n = 1, p = 1, mu = rep(0, p), sd = rep(1, p)){

  res <- matrix(NA, n, p)

  for(j in 1:p){
    res[ , j] <- rnorm_n(n, mu[j], sd[j])
  }

  return(res)

}

#' Title
#'
#' @return
#' @export
#'
#' @examples
norm_mat_par <- function(){

  nworkers <- parallel::detectCores() - 1

  cl <- parallel::makeCluster(nworkers)

  doParallel::registerDoParallel(cl)

  x <- foreach::`%dopar%`(
    foreach::foreach(j = 1:5, .errorhandling='pass', .export = "norm_mat",
                     .noexport = c("rnorm_n", "rnorm1"), .packages = c("Rcpp")),
    {
      sourceCpp("src/rnorm_c.cpp")
      norm_mat()
    })

  parallel::stopCluster(cl)

  return(x)
}

This is the github repo for my MRE

Thanks in advance to everyone taking the time to respond!


Solution

  • The GitHub repo rcpp-and-doparallel provided the solution.

    I will demonstrate here how I modified my package - the corresponding commit in the rnormpar repo has commit message "Solved parallelization".

    First, I modified the R script titled rnorm_package.R that I created for registering my cpp functions to mirror that of the rcpp-and-doparallel package:

    #' @keywords internal
    "_PACKAGE"
    
    # The following block is used by usethis to automatically manage
    # roxygen namespace tags. Modify with care!
    ## usethis namespace: start
    #' @useDynLib rnormpar, .registration = TRUE
    #' @importFrom Rcpp sourceCpp
    ## usethis namespace: end
    NULL
    

    I then deleted and re-generated my NAMESPACE using devtools::document(). This caused the following lines to be added to NAMESPACE:

    importFrom(Rcpp,sourceCpp)
    useDynLib(rnormpar, .registration = TRUE)
    

    If these lines are already in the NAMESPACE, then the first two steps are perhaps not necessary.

    Finally, I modified the arguments to the foreach function so that my package was passed to the workers:

    norm_mat_par <- function(){
    
      nworkers <- parallel::detectCores() - 1
    
      cl <- parallel::makeCluster(nworkers)
    
      doParallel::registerDoParallel(cl)
    
      x <- foreach::`%dopar%`(
        foreach::foreach(j = 1:5, .packages = "rnormpar"),
        {
          norm_mat()
        })
    
      parallel::stopCluster(cl)
    
      return(x)
    }
    

    After building the package, the function produces the expected output:

    Restarting R session...
    
    > library(rnormpar)
    > rnormpar::norm_mat_par()
    [[1]]
              [,1]
    [1,] -1.948502
    
    [[2]]
               [,1]
    [1,] -0.2774582
    
    [[3]]
              [,1]
    [1,] 0.1710537
    
    [[4]]
             [,1]
    [1,] 1.784761
    
    [[5]]
               [,1]
    [1,] -0.5694733