I am using foreach
to parallelize my code. However, I found the result to be different whether I add print
statement at the end of the loop or not.
Here is with the print
statement a the end (I get the result I want):
library(foreach)
grid <- expand.grid(
lambda=c(1,2),
mtry=c(1),
nodsize=c(1)
)
n.cores <- parallel::detectCores() - 1
#https://stackoverflow.com/a/16718078/1979665
my.cluster <- parallel::makeCluster(n.cores, outfile = "")
doParallel::registerDoParallel(cl = my.cluster)
params.all <- foreach (i = 1:nrow(grid), .combine = "rbind") %dopar% {
params <- grid[i,]
params$rmse <- "foo"
print(params)
}
parallel::stopCluster(cl = my.cluster)
params.all
Result is:
lambda mtry nodsize rmse
1 1 1 1 foo
2 2 1 1 foo
Here is without the print
statement at the end:
library(foreach)
grid <- expand.grid(
lambda=c(1,2),
mtry=c(1),
nodsize=c(1)
)
n.cores <- parallel::detectCores() - 1
#https://stackoverflow.com/a/16718078/1979665
my.cluster <- parallel::makeCluster(n.cores, outfile = "")
doParallel::registerDoParallel(cl = my.cluster)
params.all <- foreach (i = 1:nrow(grid), .combine = "rbind") %dopar% {
params <- grid[i,]
params$rmse <- "foo"
# print(params) LINE COMMENTED
}
parallel::stopCluster(cl = my.cluster)
params.all
The result is now:
[,1]
result.1 "foo"
result.2 "foo"
Isn't it weird or is it normal?
This is expected behaviour, and has nothing to do with parallel processing or dopar
. The key is to ask yourself "what is each iteration of dopar
actually outputting?"
To answer this, you need to realise that print
invisibly returns the object it is printing:
x <- 1:3
y <- print(x)
#> [1] 1 2 3
y
#> [1] 1 2 3
Whereas assignation with <-
silently returns the assigned value:
y <- (x <- "foo")
y
#> [1] "foo"
In your first version, with print
uncommented, each dopar
iteration is outputting params
because the last call in the dopar
loop is print(params)
.
When you comment out the line print(params)
, the last line of each iteration is params$rmse <- "foo"
, so your iteration is not actually outputting params
. It is returning the output of the assignation params$rmse <- "foo"
, which is just the string "foo"
. Therefore, you just get a single string "foo"
for each iteration of dopar
.
We can see this is standard behaviour using only base R:
params.all <- do.call('rbind',
lapply(1:3, function(i) {
params <- iris[i,]
params$Species <- 'foo'
print(params)
}))
params.all
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 foo
#> 2 4.9 3.0 1.4 0.2 foo
#> 3 4.7 3.2 1.3 0.2 foo
Versus
params.all <- do.call('rbind',
lapply(1:3, function(i) {
params <- iris[i,]
params$Species <- 'foo'
#print(params)
}))
params.all
#> [,1]
#> [1,] "foo"
#> [2,] "foo"
#> [3,] "foo"
The solution is to ensure that you specifically return the object at the end of the dopar
loop. You can just make the last line params
rather than print(params)
.
In other words, your question could be rewritten as "foreach
gives the expected output only if I return the correct object from the loop"