rmclapply

parallel::mclapply() adds or removes bindings to the global environment. Which ones?


Why this matters

For drake, I want users to be able to execute mclapply() calls within a locked global environment. The environment is locked for the sake of reproducibility. Without locking, data analysis pipelines could invalidate themselves.

Evidence that mclapply() adds or removes global bindings

set.seed(0)
a <- 1

# Works as expected.
rnorm(1)
#> [1] 1.262954
tmp <- parallel::mclapply(1:2, identity, mc.cores = 2)

# No new bindings allowed.
lockEnvironment(globalenv())

# With a locked environment
a <- 2 # Existing bindings are not locked.
b <- 2 # As expected, we cannot create new bindings.
#> Error in eval(expr, envir, enclos): cannot add bindings to a locked environment
tmp <- parallel::mclapply(1:2, identity, mc.cores = 2) # Unexpected error.
#> Warning in parallel::mclapply(1:2, identity, mc.cores = 2): all scheduled
#> cores encountered errors in user code

Created on 2019-01-16 by the reprex package (v0.2.1)

EDIT

For the original motivating problem, see https://github.com/ropensci/drake/issues/675 and https://ropenscilabs.github.io/drake-manual/hpc.html#parallel-computing-within-targets.


Solution

  • You can remove the .Random.seed yourself before you lock the environment. Also you need to load the library (or use the function before) and assign tmp to something.

    library(parallel)
    tmp <- NULL
    rm(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
    lockEnvironment(globalenv())
    tmp <- parallel::mclapply(1:2, identity, mc.cores = 2)
    

    Of course this will not allow functions that need .Random.seed like rnorm to work.

    A workaround is to to change the RNG kind to "L'Ecuyer-CMRG", see also here ?nextRNGStream:

    library(parallel)
    tmp <- NULL
    RNGkind("L'Ecuyer-CMRG")
    lockEnvironment(globalenv())
    tmp <- parallel::mclapply(1:2, rnorm, mc.cores = 2)
    

    EDIT

    I thought of another solution to your problem and I think this will work with any RNG (did not test much). You can override the function that removes .Random.seed with one that just sets it to NULL

    library(parallel)
    mc.set.stream <- function () {
      if (RNGkind()[1L] == "L'Ecuyer-CMRG") {
        assign(".Random.seed", get("LEcuyer.seed", envir = RNGenv), 
               envir = .GlobalEnv)
      } else {
        if (exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE)) {
          assign(".Random.seed", NULL, envir = .GlobalEnv)
        }  
      }
    }
    
    assignInNamespace("mc.set.stream", mc.set.stream, asNamespace("parallel"))
    tmp <- NULL
    set.seed(0)
    lockEnvironment(globalenv())
    tmp <- parallel::mclapply(1:2, rnorm, mc.cores = 2)
    

    One final thought: you can create a new environment containing all things you don't want to be changed, lock it and work in there.