rlinuxenvironment-variablesparallel-foreach

Dynamic library dependencies not recognized when run in parallel under R foreach()


I'm using the Rfast package, which imports the package RcppZiggurat. I'm running R 3.6.3 on a Linux cluster (Red Hat 6.1). The packages are installed on my local directory but R is installed system-wide.

The Rfast functions (e.g. colsums()) work well when I call them directly. But when I call them in a foreach() loop like the following (EDIT: I added the code to register the cluster as pointed out by Rui Barradas but it didn't fix the problem).

library(Rfast)
library(doParallel)
library(foreach)

cores <- detectCores()
cl <- makeCluster(cores)
registerDoParallel(cl)

A <- matrix(rnorm(1e6), 1000, 1000)
cm <- foreach(n = 1:4, .packages = 'Rfast') %dopar% colmeans(A)

stopCluster(cl)

then I get an error:

unable to load shared object '/home/users/sutd/R/x86_64-pc-linux-gnu-library/3.6/RcppZiggurat/libs/RcppZiggurat.so':
  libgsl.so.0: cannot open shared object file: No such file or directory

Somehow, the dynamic library is recognized when called directly but not when called under foreach().

I know that libgsl.so is located in /usr/lib64/, so I've added the following line at the beginning of my R script

Sys.setenv(LD_LIBRARY_PATH=paste("/usr/lib64/", Sys.getenv("LD_LIBRARY_PATH"), sep = ":"))

But it didn't work.

I have also tried to do dyn.load('/usr/lib64/libgsl.so') but I get the following error:

Error in dyn.load("/usr/lib64/libgsl.so") : unable to load shared object '/usr/lib64/libgsl.so': 
/usr/lib64/libgsl.so: undefined symbol: cblas_ctrmv

How do I make the dependencies available in the foreach() parallel loops?

NOTE

In the actual use case I am using the genetic algorithm package GA, and have GA::ga() which handles the foreach() loop, and within the loop I use a function in my own package which calls the Rfast functions. So I'm hoping that there is a solution where I don't have to modify the foreach() call.


Solution

  • Thanks to the answers by @RuiBarradas and @coatless, I realize that the problem is not with foreach(), because (1) the problem occurred when I ran the code with future too, and (2) it occurred with the foreach() code even with the wrong call, when I didn't register the cluster. When there is no cluster registered, foreach() will throw a warning and runs in sequential mode instead. But that didn't happen.

    Therefore, I realize that the problem must have occurred even before the foreach() call. In the log, it appeared right after the message Loading package RcppZiggurat. Something must have gone wrong when this package is loaded.

    I then checked the dependencies of RcppZiggurat, and found that it depends on another package called RcppGSL, which interfaces R and the GSL library. Bingo, that's where libgsl.so.0 is needed when RcppZiggurat is called.

    So I made an R script named test-gsl.R, which has the following two lines.

    library(RcppZiggurat)
    print(‘OK’)
    

    Now, I run the following on the head node

    $ module load R/3.6.3
    $ Rscript test-gsl.R
    

    And everything works fine. The ‘OK’ is printed.

    But this doesn’t work if I submit the job on the compute node. First, the PBS script, called test.sh, is as follows

    ### Resources request
    #PBS -l select=1:ncpus=1:mem=1GB
    
    ### Walltime
    #PBS -l walltime=00:01:00
    
    echo Working directory is $PBS_O_WORKDIR
    cd $PBS_O_WORKDIR
    
    ### Run R
    module load R/3.6.3
    Rscript test-gsl.R
    

    Then I ran

    qsub test.sh
    

    And the error popped out. This means that there is something different between the compute node and the head node on my system, and nothing to do with the packages. I contacted the system administrator, who explained to me that the GSL library is available on the head node at the default path, but not on the compute node. So in my shell script, I need to add module load gsl/2.1 before running my R script. I tested that and everything worked.

    The solution seems simple enough, but I know very little about Linux administration to realize it. Only after asking around and trying (rather blindly) many things did I finally come to this solution. So thanks to those who've offered help, and mea culpa for not being able to describe the problem accurately at the beginning.