I want to run multiple partial least squares models in R and am trying to take advantage of the parallel package. However, after running my code, I can see instances of Rscript in my task manager that do not terminate unless I close RStudio. These Rscipts are a problem because if I perform too many iterations, they eat up all of the free memory on my computer and basically grind it to a halt.
Does anyone know how to deal with these Rscripts that linger around (or can point out the error in my code, I'm new to R)?
Below is my sample code:
library(pls) #Package for PLS regression and MSC
library(parallel) #Allows for multi-core computations for cross-validation calculations
data(gasoline)
#Parallel Computing setup
num_cores <- 2
Made_Cluster = makeCluster(num_cores, type = "PSOCK")
num_iterations <- 10
for (i in 1:num_iterations) {
pls.options(parallel = makeCluster(num_cores, type = "PSOCK"))
gas1 <- plsr(octane ~ NIR, data = gasoline, validation = "LOO")
}
stopCluster(Made_Cluster)
I have confirmed that the placement of the makeCluster and StopCluster commands inside the loop produces the same Rscripts that do not terminate. It also occurs, even when num_cores <- 1
library(pls) #Package for PLS regression and MSC
library(parallel) #Allows for multi-core computations for cross-validation calculations
data(gasoline)
#Parallel Computing setup
num_cores <- 1
num_iterations <- 10
for (i in 1:num_iterations) {
Made_Cluster = makeCluster(num_cores, type = "PSOCK")
pls.options(parallel = makeCluster(num_cores, type = "PSOCK"))
gas1 <- plsr(octane ~ NIR, data = gasoline, validation = "LOO")
stopCluster(Made_Cluster)
}
Finally, the terminal is displaying odd messages regarding unused connections. These warnings exhibit different syntax and I am not able to consistently reproduce them. Here are a couple of examples:
Warning messages:
1: In if (!is.vector(X) || is.object(X)) X <- as.list(X) :
closing unused connection 4 (<-mycomputer:port#)
2: In is.data.frame(x) :
closing unused connection 13 (<-mycomputer:port#)
3: In crossprod(q.a) :
closing unused connection 17 (<-mycomputer:port#)
Here is my sessioninfo()
Rstudio
$version
[1] ‘1.1.456’
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] pls_2.6-0
loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1
You want to create only one "cluster" object (= one call to makeCluster()
, not multiple). Something like:
cl <- makeCluster(num_cores, type = "PSOCK")
pls.options(parallel = cl)
[...]
for (i in 1:num_iterations) {
gas1 <- plsr(octane ~ NIR, data = gasoline, validation = "LOO")
}
stopCluster(cl)
Explanation of your observations: If you use pls.options(parallel = makeCluster(...))
you end up creating another cluster in that call, which will not be explicitly stopped since you don't have a handle to it. Its underlying connections will eventually be closed when R's garbage collector finds such a "stray" cluster - this is why/when you get those warnings. If you put pls.options(parallel = makeCluster(...))
inside the loop you'll create one stray cluster per iteration and you'll get even more warnings. The garbage collector runs at "random" times, which is why the traceback of those warnings appear random/non-reproducible.