I want to run an R script to use the reversed feature elimination from the caret package on a computer cluster. Ideally I would like to run it on multiple cores in parallel. In the script of a coworker, I found the use of the doMC
package. I read that this package is used together with the foreach
package. But in the script I got, there is simply the library imported and in the line before the rfe
command there is a registerDoMC(5)
. There is not a single use of foreach
in the whole script.
Will the doMC
do anything here or does it only work together with foreach
?
Is there a way to distribute the resource consuming rfe
process on multiple cores?
Read the documentation:
rfe can be used with "explicit parallelism", where different resamples (e.g. cross-validation group) can be split up and run on multiple machines or processors. By default, rfe will use a single processor on the host machine. As of version 4.99 of this package, the framework used for parallel processing uses the foreach package. To run the resamples in parallel, the code for rfe does not change; prior to the call to rfe, a parallel backend is registered with foreach (see the examples below).
So, caret::rfe
uses foreach
internally.