rfor-loopparallel-processingparallel-foreach

foreach gives the expected output only if I put a print statement at the end of the loop


I am using foreach to parallelize my code. However, I found the result to be different whether I add print statement at the end of the loop or not.

Here is with the print statement a the end (I get the result I want):

library(foreach)

grid <- expand.grid(
  lambda=c(1,2),
  mtry=c(1),
  nodsize=c(1)
)

n.cores <- parallel::detectCores() - 1
#https://stackoverflow.com/a/16718078/1979665
my.cluster <- parallel::makeCluster(n.cores, outfile = "")
doParallel::registerDoParallel(cl = my.cluster)
params.all <- foreach (i = 1:nrow(grid), .combine = "rbind") %dopar% {
  params <- grid[i,]
  params$rmse <- "foo"
  print(params)
}
parallel::stopCluster(cl = my.cluster)
params.all

Result is:

  lambda mtry nodsize rmse
1      1    1       1  foo
2      2    1       1  foo

Here is without the print statement at the end:

library(foreach)

grid <- expand.grid(
  lambda=c(1,2),
  mtry=c(1),
  nodsize=c(1)
)

n.cores <- parallel::detectCores() - 1
#https://stackoverflow.com/a/16718078/1979665
my.cluster <- parallel::makeCluster(n.cores, outfile = "")
doParallel::registerDoParallel(cl = my.cluster)
params.all <- foreach (i = 1:nrow(grid), .combine = "rbind") %dopar% {
  params <- grid[i,]
  params$rmse <- "foo"
  # print(params) LINE COMMENTED
}
parallel::stopCluster(cl = my.cluster)
params.all

The result is now:

         [,1] 
result.1 "foo"
result.2 "foo"

Isn't it weird or is it normal?


Solution

  • This is expected behaviour, and has nothing to do with parallel processing or dopar. The key is to ask yourself "what is each iteration of dopar actually outputting?"

    To answer this, you need to realise that print invisibly returns the object it is printing:

    x <- 1:3
    y <- print(x)
    #> [1] 1 2 3
    
    y
    #> [1] 1 2 3
    

    Whereas assignation with <- silently returns the assigned value:

    y <- (x <- "foo")
    
    y
    #> [1] "foo"
    

    In your first version, with print uncommented, each dopar iteration is outputting params because the last call in the dopar loop is print(params).

    When you comment out the line print(params), the last line of each iteration is params$rmse <- "foo", so your iteration is not actually outputting params. It is returning the output of the assignation params$rmse <- "foo", which is just the string "foo". Therefore, you just get a single string "foo" for each iteration of dopar.

    We can see this is standard behaviour using only base R:

    params.all <- do.call('rbind',
      lapply(1:3, function(i) {
          params <- iris[i,]
          params$Species <- 'foo'
          print(params)
    }))
    
    params.all
    #>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
    #> 1          5.1         3.5          1.4         0.2     foo
    #> 2          4.9         3.0          1.4         0.2     foo
    #> 3          4.7         3.2          1.3         0.2     foo
    

    Versus

    params.all <- do.call('rbind',
      lapply(1:3, function(i) {
          params <- iris[i,]
          params$Species <- 'foo'
          #print(params)
    }))
    
    params.all
    #>      [,1] 
    #> [1,] "foo"
    #> [2,] "foo"
    #> [3,] "foo"
    

    The solution is to ensure that you specifically return the object at the end of the dopar loop. You can just make the last line params rather than print(params).

    In other words, your question could be rewritten as "foreach gives the expected output only if I return the correct object from the loop"