rparallel-processingparallel-foreachdoparallel

foreach, doParallel and random generation


Consider the very basic (and inefficient) code using parallel foreach for generating random values:

cl <- makeCluster(2)
registerDoParallel(cl)
foreach(i = 1:100) %dopar% rnorm(1)

Is it correct or are there any additional steps needed for random generation to work properly? I guess it's enough and fast checks seem to "prove" that seeds work properly, but I'd like to be sure that it is so on other platforms, since I want the code to be portable.


Solution

  • Your worries are correct; random number generation does not magically work in parallel and further steps need to be taken. When using the foreach framework, you can use the doRNG extension to make sure to get sound random numbers also when done in parallel.

    Example:

    library("doParallel")
    cl <- makeCluster(2)
    registerDoParallel(cl)
    
    ## Declare that parallel RNG should be used for in a parallel foreach() call.
    ## %dorng% will still result in parallel processing; it uses %dopar% internally.
    library("doRNG")
    
    y <- foreach(i = 1:100) %dorng% rnorm(1)
    

    EDIT 2020-08-04: Previously this answer proposed the alternative:

    library("doRNG")
    registerDoRNG()
    y <- foreach(i = 1:100) %dopar% rnorm(1)
    

    However, the downside for that is that it is more complicated for the developer to use registerDoRNG() in a clean way inside functions. Because of this, I recommend to use %dorng% to specify that parallel RNG should be used.