my B object is a big matrix 100 000 * 5000 of 2 GB
my A object is smaller 1000 * 5000
analyse_with_glm <- function(Y) {
cond1 = unlist(apply(B, 2, function(X) coef(summary(glm(Y~X)))[,4][2]))
cond2 = unlist(apply(B, 2, function(X) coef(summary(glm(Y~X+cov2)))[,4][2]))
cond3 = unlist(apply(B, 2, function(X) coef(summary(glm(Y~X+cov3)))[,4][2]))
list(cond1, cond2, cond3)}
cl = makeCluster(nb_cpu, type = "FORK", outfile='outcluster.log')
res = parApply(cl, A, 2, analyse_with_glm)
Initially I have a single rsession process using 2.1GB of my mermoy.
After calling parApply function it I have nb_cpu threads of 4.5GB.
I use 'top' command to monitor thread and memory usage and this is not superficial usage that garbage collector can release. Threads crash for being out of memory. It run on a 128GB memory computer with 30 threads (nb_cpu = 30 in my code).
NB: I also tried contrary, using B (the big matrix) in parApply instead of A but it did not fix the issue.
This answer might be partial as I still consider R behavior weird when it comes to parallelizing code. If you run code from RStudio, parallel thread tend to be inflated by the size of ~/.rstudio/suspended-session-data/
So to avoid it, here is a dummy workaround.
1. Clean your environment
2. Log-out
3. Log-in
4. Load your data
5. Run parallel code
INFO: