javarslurmdoparallelsystem2

Using doParallel to start multiple system calls from R within a Slurm job


I am using an R script that basically pastes together commandline commands to execute through system2(). The commands run some Java application.

Now, I want to spawn multiple processes of that Java application at once, to execute some tasks on a cluster computer. The jobs are submitted via Slurm. Does it make sense to execute the system calls from within R using doParallel with the number of cores reserved for the Slurm job? Or are there more efficient options (e.g., run multiple instances of the R script in parallel through Slurm, in order to spawn the parallel Java instances)?

I am not sure how Slurm or parallel allocate resources and how to spawn the processes most efficiently. Which process will control where the Java instances are executed in this setup?

Example Slurm job:

#!/bin/bash

#SBATCH --job-name=somejob
#SBATCH --output=somejob%a.out
#SBATCH --time=2:00:00
#SBATCH --partition=node
#SBATCH --qos=normal
#SBATCH --account=node
#SBATCH --cpus-per-task=20
#SBATCH --mem-per-cpu=3200
#SBATCH --ntasks=1
#SBATCH --array=1#-12

srun R --vanilla -f somescript.R

Example R script:

#!/usr/bin/env Rscript

require("doParallel")

cl <- parallel::makeCluster(20)
doParallel::registerDoParallel(cl)

foreach::foreach(
  arg1 = 1:20, .packages = "mypackage"
  ) %dopar% {
    arg2 <- "some_arg"
    system2("/path/to/java.exe", args = c(arg1, arg2), stdout = TRUE)
  }

Solution

  • The following R script and shell script will run 2 R sessions, each using 32 cores, for 64 parallel executions of your Java code. You can also modify to suit a different cluster nodes and cores configuration.

    library(mypackage) # substitute your package name and add others
    library(pbdMPI)
    
    my_arg1 = comm.chunk(64, form = "vector") # gets arg1 instances for this rank
    
    sys_call = function(arg1) {
        arg2 <- "some_arg"
        system2("/path/to/java.exe", args = c(arg1, arg2), stdout = TRUE)
    }
    mclapply(my_arg1, sys_call, mc.cores = 32)
    
    finalize()
    

    Save the above in my_r_script.R.

    #!/bin/bash
    
    #SBATCH --nodes=2
    #SBATCH --exclusive
    
    module load r
    
    mpirun  --map-by ppr:1:node Rscript my_r_script.R
    

    Save the above in my_script.sh and submit to Slurm with sbatch my_script.sh.

    The shell script asks for 2 nodes and all cores on the nodes. The OpenMPI mpirun places 1 R session per node and because of exclusive access all cores are available for mclapply().

    You may need to load modules that provide your software environment, including module load r. These details are usually site-dependent.

    You will need additional #SBATCH parameters for account, queue, time, and possibly memory, which may vary on different clusters with different local defaults.

    Your arg1 parameter is different in each of the 64 instances and can be used to create different output file names in your Java code.