[SOLVED] Is `Map()` when used in a `data.table` parallel?

Is `Map()` when used in a `data.table` parallel? - R

From the data.table package website, given that:

"many common operations are internally parallelized to use multiple CPU threads"

I would like to know if that is the case when Map() is used within a data.table?

The reason for asking is because I have noticed that comparing the same operation on a large dataset (cor.test(x, y) with x = .SD and y being a single column of the dataset), the one using Map() performs quicker than when furrr::fututre_map2() is used.

Solution

You can use this rather explorative approach and see whether the time elapsed shrinks when more threads are used. Note that on my machine the maximum number of usable threads is just one, so no difference is possible

library(data.table)

dt <- data.table::data.table(a = 1:3,
                             b = 4:6)
dt
#>    a b
#> 1: 1 4
#> 2: 2 5
#> 3: 3 6

data.table::getDTthreads()
#> [1] 1

# No Prallelisation ----------------------------------
data.table::setDTthreads(1)
system.time({
  
  dt[, lapply(.SD,
              function(x) {
                Sys.sleep(2)
                x}
  )
  ]
})
#>    user  system elapsed 
#>   0.009   0.001   4.017

# Parallel -------------------------------------------
# use multiple threads
data.table::setDTthreads(2)
data.table::getDTthreads()
#> [1] 1

# if parallel, elapsed should be below 4
system.time({
  
  dt[, lapply(.SD,
              function(x) {
                Sys.sleep(2)
                x}
  )
  ]
})
#>    user  system elapsed 
#>   0.001   0.000   4.007

# Map -----------------------------------------------
# if parallel, elapsed should be below 4
system.time({
  
  dt[, Map(f = function(x, y) {
    Sys.sleep(2)
    x},
    .SD,
    1:2
    
  )
  ]
})
#>    user  system elapsed 
#>   0.002   0.000   4.005