rparallel-processingdoparallelfitdistrplusrparallel

Error in parallel calculation on R function


I am mastering parallel programming in R simple examples tried to implement the function given in the code, it loads the processor, but the results of calculations are not. Could you please tell me where I made a mistake in the code?

library(foreach)
library(doSNOW)

power_Ks_Cauchy_par <- function(resp, alpha, distData, sample)
{
       corNum<-detectCores()-1
       cluster <- makeCluster(corNum, type = "SOCK", outfile="")
       registerDoSNOW(cluster)
       power <- c()
       num <- c()
         foreach(i=3:sample, .combine=rbind) %dopar%{
            distPar <- fitdistrplus::fitdist(data = distData[1:i],"cauchy")
            loc <- distPar$estimate[1]
            sc <- distPar$estimate[2] 
            test<-mean(replicate(resp,(ks.test(rcauchy(i, loc, sc),"pnorm")$`p.value`<alpha)))
            power<-c(power,test)
            num <- c(num,i)
          }
        power <- data.frame(power, samples=num)
        stopCluster(cluster)
        return(power)
}

     repl<-10000
     alpha <- 0.05
     samles <- length(baseDataMeh$Sigma02)
     time1<-system.time({
     resKStestCauchy_1 <- power_Ks_Cauchy_par(repl,alpha = alpha,    
     distData=baseDataMeh$Sigma02,sample=samles)
     })

I tried several variants of paralleling, but on my architecture only implementation through these libraries gave a gain in calculation time on simple examples.

The code implemented through for in a function works, but it takes a very long time to calculate.

The function loads the processor, but there is no calculation result at the output.

Could you please tell me what I am doing wrong?


Solution

  • Consider directly assigning an object to foreach call. If using .combine, foreach resembles an apply family function (i.e., lapply, sapply) to return an object (equal length to iterable) and not simply for loop to iteratively update environment objects.

    Below adjustment returns test in foreach function and assigns to power_vec object. Also, num capturing i-th number is not needed.

    power_vec <- foreach(
      i=3:sample, .combine=rbind
    ) %dopar% {
      distPar <- fitdistrplus::fitdist(data = distData[1:i],"cauchy")
      loc <- distPar$estimate[1]
      sc <- distPar$estimate[2] 
      test <- mean(
        replicate(
          resp, (ks.test(rcauchy(i, loc, sc),"pnorm")$`p.value`<alpha)
        )
      )
    }
    
    power_df <- data.frame(power=power_vec, samples=3:sample)