rloadrdatamclapply

R: loading multiple RData with mclapply doesn't work


I wanted to load multiple RData in one command, as explained by Johua using

> lapply(c(a_data, b_data, c_data, d_data), load, .GlobalEnv)
[[1]]
[1] "nRTC_Data"

[[2]]
[1] "RTA_Data"

[[3]]
[1] "RTC_Data"

[[4]]
[1] "RTA_Data"

> rm(a_data, b_data, c_data, d_data); ls()
 [1] "nRTC_Data"       "RTA_Data"           "RTAC_data"     "RTC_Data"    
      

However, since my RData are big, and I found no time improvement between lappy() and multiple load(), I decided to use multi-core approach like following:

library(parallel)
mclapply(c(a_data, b_data, c_data, d_data),load,.GlobalEnv, mc.cores = parallel::detectCores())

Though this significantly improved the loading time, also returns the list

   [[1]]
    [1] "nRTC_Data"
    
    [[2]]
    [1] "RTA_Data"
    
    [[3]]
    [1] "RTC_Data"
    
    [[4]]
    [1] "RTA_Data"

In my workspace, nothing is found

> rm(a_data, b_data, c_data, d_data); ls()
character(0)

I also tried replacing .GlobalEnv by environment(), but still didn't work.

Any one has a clue?

FYI, you can try with following commands:

> a = "aa";save(a, file = "aa.RData")
> b = "bb";save(b, file = "bb.RData")
> c = "cc";save(c, file = "cc.RData")
> d = "dd";save(d, file = "dd.RData")

> # lapply approach
> rm(list = ls())
> a = "aa.RData"; b = "bb.RData"; c = "cc.RData"; d = "dd.RData"
> lapply(c(a, b, c, d), load, .GlobalEnv); rm(a, b, c, d) 

> # mclapply approach
> rm(list = ls())
> a = "aa.RData"; b = "bb.RData"; c = "cc.RData"; d = "dd.RData"
> mclapply(c(a, b, c, d), load, .GlobalEnv, mc.cores = parallel::detectCores()); rm(a, b, c, d)

Solution

  • I think it's because when using mclapply the underlying forking creates separate processes. In the code below I use mclapply with myload function that loads the Rdata file and returns the object loaded. The difference with your lapply version is that you have the data in the list returned by mclapply

    myload <- function(x){
      x <- load(x)
      get(x)
    }
    
    a = "aa.RData"; b = "bb.RData"; c = "cc.RData"; d = "dd.RData"
    
    res <- mclapply(c(a, b, c, d), myload, mc.cores = parallel::detectCores());