rrparallel

what exactly does the first argument in makeCluster function do?


I am new to r programming as you can tell from the nature of my question. I am trying to take advantage of the parallel computing ability of the train function.

library(parallel)
#detects number of cores available to use for parallel package
nCores <- detectCores(logical = FALSE)
cat(nCores, " cores detected.")  

# detect threads with parallel()
nThreads<- detectCores(logical = TRUE)
cat(nThreads, " threads detected.")

# Create doSNOW compute cluster (try 64)
# One can increase up to 128 nodes
# Each node requires 44 Mbyte RAM under WINDOWS.
cluster <- makeCluster(128, type = "SOCK")
class(cluster);

I need someone to help me interpret this code. originally the first argument of makeCluster() had nthreads but after running

nCores <- detectCores(logical = FALSE)

I learned that I have 4 threads available. I changed the value based on the message provided in the guide. Will this enable me simultaneously run 128 iterations of the train function at once? If so what is the point of getting the number of threads and cores that my computer has in the first place?


Solution

  • What you want to do is to detect first the amount of cores you have.

    nCores <- detectCores() - 1
    

    Most of the time people add minus 1 to be sure you have one core left to do other stuff on.

    cluster <- makeCluster(nCores)
    

    This will set the amount of clusters you want your code to run on. There are several parallel methods (doParallel, parApply, parLapply, foreach,..). Based on the parallel method you choose, there will run a method on one specific cluster you've created.

    Small example I used in code of mine

      no_cores <- detectCores() - 1
      cluster <- makeCluster(no_cores)
      result <- parLapply(cluster, docs$text, preProcessChunk)
      stopCluster(cluster)
    

    I also see that your making use of sock. Not sure if "type=SOCK" works. I always use "type=PSOCK". FORK also exists but it depends on which OS you're using.

    FORK: "to divide in branches and go separate ways"
    Systems: Unix/Mac (not Windows)
    Environment: Link all
    
    PSOCK: Parallel Socket Cluster
    Systems: All (including Windows)
    Environment: Empty