rforeachdoparallel

Limit iterations in foreach and doParallel


I'm trying to implement a nested for-loop using foreach and doParallel, but I don't want to loop over all combinations of values. Basically, I've got a square dataset and I want to run a function over each pair of values, but I don't need to duplicate--e.g., I need to calculate the function for [1,2], but not [2,1] since the result is the same. Here is a very basic example, though please note that I'm trying to use doParallel due to the complexity of the actual function/calculations.

bvec <- seq(1,10,1)
avec <- seq(1,10,1)

x <- data.frame()
for (i in 1:10) {
  for (j in i:10) {
    x[i,j] <- sim(avec[i], bvec[j])
  }
}
x

The original dataset is about 1800 x 1800 which would result in over 3.2 million calculations if I did all pairwise calculations, which is unnecessary. Here is what I've got for the foreach:

cl <- parallel::makeCluster(detectCores()-4)
doParallel::registerDoParallel(cl)
clusterExport(cl, list("bvec","avec"))  
z <-
  foreach(i=1:10, .combine="cbind") %:%
    foreach(j=i:10) %dopar% {
      x[i,j] <- sim(avec[i], bvec[j])
    }
z
parallel::stopCluster(cl)

Is it possible to limit the iterations using foreach? If not, is there any other way to optimize this process?

I've tried changing the foreach statement to

foreach(i=1:10, .combine="cbind") %:%
    foreach(j=i:10) %dopar% {
      x[i,j] <- sim(avec[i], bvec[j])
    }

but that obviously doesn't work.


Solution

  • Edit - The below ideas benchmark slower than the simple loop. %do% is faster than %dopar%. Things get slow enough to tell at vec length 200. You'll want to benchmark basic parallel processes on your device to see if parallel is worth the overhead going forward.

    ...

    I ran microbenchmark on a 1800x1800 data, and your nested if() triangle loop is faster than outer() at that number of calculations for sum().

    Here is a way to do foreach nesting (lifted from the docs at https://cran.r-project.org/web/packages/foreach/vignettes/nested.html ) combined with an ifelse() trick of evaluating the innerloop and skipping the heavy function for half the triangle.

    foreach(b=bvec, .combine='cbind') %:%
        foreach(a=avec, .combine='c') %dopar% {
          ifelse(a>=b, sum(a, b), NA)   # ifelse to skip expesive operation
        }
    

    The j=i:10 idea and writing to a global object works with %do%, but not %dopar%, which is discussed in this thread https://stackoverflow.com/a/45920140/10276092 and says "[%dopar%] does not change the global object [x]"

    x <- matrix(NA, nrow = 10, ncol = 10)
    foreach(i=1:10, .combine="cbind") %:%
      foreach(j=i:10, .combine="c", .inorder=TRUE) %do% { # %do% works
        x[i,j] <- sum(avec[i], bvec[j])
      }
    x
    

    Below kind of works, but recycles the skipped values. Triangle shape isn't correct correct. Matrix magic from https://stackoverflow.com/a/48988950/10276092 to make data slightly presentable.

    aa <- foreach(i=1:10, .combine="cbind") %:%
      foreach(j=i:10, .combine="c", .inorder=TRUE) %dopar% {
        sum(avec[i], bvec[j])
      }
    aa[col(aa) + row(aa) > nrow(aa) + 1] <- 0 # drop the recycling
    aa
    
           result.1 result.2 result.3 result.4 result.5 result.6 result.7 result.8 result.9 result.10
     [1,]        2        4        6        8       10       12       14       16       18        20
     [2,]        3        5        7        9       11       13       15       17       19         0
     [3,]        4        6        8       10       12       14       16       18        0         0
     [4,]        5        7        9       11       13       15       17        0        0         0
     [5,]        6        8       10       12       14       16        0        0        0         0
     [6,]        7        9       11       13       15        0        0        0        0         0
     [7,]        8       10       12       14        0        0        0        0        0         0
     [8,]        9       11       13        0        0        0        0        0        0         0
     [9,]       10       12        0        0        0        0        0        0        0         0
    [10,]       11        0        0        0        0        0        0        0        0         0