rparallel-processingforkcpu-coresmclapply

how to control potential fork bomb caused by mclapply, tried ulimit but didn't work


I am using mclapply in my R script for parallel computing. It saves overall memory usage and it is fast so I want to keep it in my script. However, one thing I noticed is that the number of child processes generated during running the script is more than the number of cores I specified using mc.cores. Specifically, I am running my script on a server with 128 cores. And when I run my script, I set mc.cores to 18. During the running of the script, I checked the processes related to my script using htop. First, I can find 18 processes like this:

enter image description here

3_GA_optimization.R is my script. This all look good. But I also found more than 100 processes running at the same time with similar memory and CPU usage. The screenshot below shows some of them: enter image description here

The problem of this is that although I only required 18 cores, the script actually uses all the 128 cores on the server and this makes the server very slow. So my first question is why is this happening? And what is the difference between these processes with green color compared to the 18 processes with black color?

My second question is that I tried to use ulimit -Su 100 to set the soft limit of maximum number of processes that I can use before running Rscript 3_GA_optimization.R. I chose 100 based on the current number of processes I am using before running the script and the number of cores I want to use when running the script. However, I got an error saying:

Error in mcfork():
unable to fork, possible reason: Resource temporarily unavailable

So it seems that mclapply has to generate a lot more processes than mc.cores in order for the script to run, which is confusing to me. So my second question is that why does mclapply behaves in this way? Is there any other way to fix the total number of cores mclapply can use?


Solution

  • OP followed up in a comment on 2021-05-17 and confirmed that the problem was their parallelization via mclapply() called functions of the ranger package, which in turn parallelized using all available CPU cores. This nested parallelism, cause R to use many more CPU cores than available on the machine.