I am working with daily series of satellite images on a workstation with 64 cores.
For each image, I perform some algebra operations over all pixels using a foreach
loop. Some testing revealed that the optimal number of cores for this foreach
loop is 20.
This is roughly what I am doing now:
for (i in length(number_of_daily_images){
# perform some pre-processing on each image
# register cluster to loop over pixels
registerDoParallel(20)
out <- foreach(j=1:length(number_of_pixels_in_each_image)) %dopar% {
# perform some calculations
} # end inner loop
} # end outer loop
I only have to load the satellite image once, so there is very little I/O processing involved in this code. So there is definitely room for speeding up this code even further. Since I am only using one third of the cores available on the computer, I would like to run three days simultaneously to save some precious time in my workflow.
Therefore, I was thinking about also parallelizing my outer loop. It would be something like this:
# register cluster to loop over images
registerDoParallel(3)
out2 <- foreach (i = length(number_of_daily_images) %dopar% {
# perform some pre-processing on each image
# register cluster to loop over pixels
registerDoParallel(20)
out1 <- foreach(j = 1:length(number_of_pixels_in_each_image)) %dopar% {
# perform some calculations
} # end inner loop
} # end outer loop
However, when I run this code I get an error saying that one of the variables involved in the processing within the inner loop does not exist. But it works fine with a "regular" outter for
loop.
Therefore, my question is: can I use two nested %dopar%
loops in foreach like I was planning? If not, is there any other alternative to also parallelize my outer loop?
Foreach maintainer here.
Use the %:%
operator:
registerDoParallel(60)
out2 <- foreach(i = 1:length(number_of_daily_images)) %:%
foreach(j = 1:length(number_of_pixels_in_each_image)) %dopar% {
# perform some calculations
something(i, j)
}