I am trying to parallelize Rcpp
code. From this post, I was able to get my MRE to run and produce the expected output by just sourcing the functions:
> Rcpp::sourceCpp("src/rnorm_c.cpp")
> source("~/<path to project folder>/rnormpar/R/normal_mat.R")
> norm_mat_par()
[[1]]
[,1]
[1,] -0.1117342
[[2]]
[,1]
[1,] 0.05094005
[[3]]
[,1]
[1,] 0.1137641
[[4]]
[,1]
[1,] 0.8624004
[[5]]
[,1]
[1,] 0.7821107
However, after building and running the function from within the package, the output changed to:
Restarting R session...
> library(rnormpar)
> rnormpar::norm_mat_par()
[[1]]
<simpleError in .Call("_rnormpar_rnorm_n", PACKAGE = "rnormpar", n, mu, sd): "_rnormpar_rnorm_n" not available for .Call() for package "rnormpar">
[[2]]
<simpleError in .Call("_rnormpar_rnorm_n", PACKAGE = "rnormpar", n, mu, sd): "_rnormpar_rnorm_n" not available for .Call() for package "rnormpar">
[[3]]
<simpleError in .Call("_rnormpar_rnorm_n", PACKAGE = "rnormpar", n, mu, sd): "_rnormpar_rnorm_n" not available for .Call() for package "rnormpar">
[[4]]
<simpleError in .Call("_rnormpar_rnorm_n", PACKAGE = "rnormpar", n, mu, sd): "_rnormpar_rnorm_n" not available for .Call() for package "rnormpar">
[[5]]
<simpleError in .Call("_rnormpar_rnorm_n", PACKAGE = "rnormpar", n, mu, sd): "_rnormpar_rnorm_n" not available for .Call() for package "rnormpar">
Here is the code for my MRE. It consists of two scripts. The first is the Rcpp
code:
#include <RcppArmadillo.h>
//[[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// function to generate a single sample from the standard normal distribution
//[[Rcpp::export]]
double rnorm1() {
return (double)arma::vec(1, arma::fill::randn)(0, 0);
}
// function to return a vector of n samples from the normal distribution
//[[Rcpp::export]]
arma::vec rnorm_n(int n = 1, double mu = 0, double sd = 1){
arma::vec res(n);
for (int j = 0; j < n; j++){
res(j) = rnorm1();
}
res = res * sd + mu;
return res;
}
The second is the R
code:
# generates a matrix distributed independent normal
# takes n, p, mean vector, and sd vector representing the diagonal of the
# covariance matrix
#' Normal matrix
#'
#' @param n sample size
#' @param p number of variables
#' @param mu mean vector
#' @param sd diagonal of the covariance matrix
#'
#' @return normal matrix
#' @export
#'
#' @examples norm_mat(1e2, 3, -1:1, 1:3)
norm_mat <- function(n = 1, p = 1, mu = rep(0, p), sd = rep(1, p)){
res <- matrix(NA, n, p)
for(j in 1:p){
res[ , j] <- rnorm_n(n, mu[j], sd[j])
}
return(res)
}
#' Title
#'
#' @return
#' @export
#'
#' @examples
norm_mat_par <- function(){
nworkers <- parallel::detectCores() - 1
cl <- parallel::makeCluster(nworkers)
doParallel::registerDoParallel(cl)
x <- foreach::`%dopar%`(
foreach::foreach(j = 1:5, .errorhandling='pass', .export = "norm_mat",
.noexport = c("rnorm_n", "rnorm1"), .packages = c("Rcpp")),
{
sourceCpp("src/rnorm_c.cpp")
norm_mat()
})
parallel::stopCluster(cl)
return(x)
}
This is the github repo for my MRE
Thanks in advance to everyone taking the time to respond!
The GitHub repo rcpp-and-doparallel provided the solution.
I will demonstrate here how I modified my package - the corresponding commit in the rnormpar
repo has commit message "Solved parallelization".
First, I modified the R script titled rnorm_package.R
that I created for registering my cpp
functions to mirror that of the rcpp-and-doparallel
package:
#' @keywords internal
"_PACKAGE"
# The following block is used by usethis to automatically manage
# roxygen namespace tags. Modify with care!
## usethis namespace: start
#' @useDynLib rnormpar, .registration = TRUE
#' @importFrom Rcpp sourceCpp
## usethis namespace: end
NULL
I then deleted and re-generated my NAMESPACE
using devtools::document()
. This caused the following lines to be added to NAMESPACE
:
importFrom(Rcpp,sourceCpp)
useDynLib(rnormpar, .registration = TRUE)
If these lines are already in the NAMESPACE
, then the first two steps are perhaps not necessary.
Finally, I modified the arguments to the foreach
function so that my package was passed to the workers:
norm_mat_par <- function(){
nworkers <- parallel::detectCores() - 1
cl <- parallel::makeCluster(nworkers)
doParallel::registerDoParallel(cl)
x <- foreach::`%dopar%`(
foreach::foreach(j = 1:5, .packages = "rnormpar"),
{
norm_mat()
})
parallel::stopCluster(cl)
return(x)
}
After building the package, the function produces the expected output:
Restarting R session...
> library(rnormpar)
> rnormpar::norm_mat_par()
[[1]]
[,1]
[1,] -1.948502
[[2]]
[,1]
[1,] -0.2774582
[[3]]
[,1]
[1,] 0.1710537
[[4]]
[,1]
[1,] 1.784761
[[5]]
[,1]
[1,] -0.5694733