I'm trying to parallelize a for loop that I have. There is a nested loop inside the loop in question that I'd like to parallelize. The answer is bound to be very similar to: nested foreach loops in R to update common array, but I can't seem to get it to work. I've tried all the options I can think of, including just turning the inner loop into its own function and parallelizing that, but I keep getting empty lists back.
The first, non-foreach example works:
theFrame <- data.frame(col1=rnorm(100), col2=rnorm(100))
theVector <- 2:30
regFor <- function(dataFrame, aVector, iterations)
{
#set up a blank results matrix to save into.
results <- matrix(nrow=iterations, ncol=length(aVector))
for(i in 1:iterations)
{
#set up a blank road map to fill with 1s according to desired parameters
roadMap <- matrix(ncol=dim(dataFrame)[1], nrow=length(aVector), 0)
row.names(roadMap) <- aVector
colnames(roadMap) <- 1:dim(dataFrame)[1]
for(j in 1:length(aVector))
{
#sample some of the 0s and convert to 1s according to desired number of sample
roadMap[j,][sample(colnames(roadMap),aVector[j])] <- 1
}
temp <- apply(roadMap, 1, sum)
results[i,] <- temp
}
results <- as.data.frame(results)
names(results) <- aVector
results
}
test <- regFor(theFrame, theVector, 2)
But this and my other similar attempts do not work.
trying <- function(dataFrame, aVector, iterations, cores)
{
registerDoMC(cores)
#set up a blank results list to save into. i doubt i need to do this
results <- list()
foreach(i = 1:iterations, .combine="rbind") %dopar%
{
#set up a blank road map to fill with 1s according to desired parameters
roadMap <- matrix(ncol=dim(dataFrame)[1], nrow=length(aVector), 0)
row.names(roadMap) <- aVector
colnames(roadMap) <- 1:dim(dataFrame)[1]
foreach(j = 1:length(aVector)) %do%
{
#sample some of the 0s and convert to 1s according to desired number of sample
roadMap[j,][sample(colnames(roadMap),aVector[j])] <- 1
}
results[[i]] <- apply(roadMap, 1, sum)
}
results
}
test2 <- trying(theFrame, theVector, 2, 2)
I take it that I have to use foreach on the inner loop no matter what, right?
When using foreach, you never "set up a blank results list to save into", as you suspected. Instead, you combine the results of evaluating the body of the foreach loop, and that combined result is returned. In this case, we want the outer foreach loop to combine vectors (computed by the inner foreach loop) row-wise into a matrix. That matrix is assigned to the variable results
, which is then converted to a data frame.
Here's my first attempt at converting your example:
library(doMC)
foreachVersion <- function(dataFrame, aVector, iterations, cores) {
registerDoMC(cores) # unusual, but reasonable with doMC
rows <- nrow(dataFrame)
cols <- length(aVector)
results <-
foreach(i=1:iterations, .combine='rbind') %dopar% {
# The value of the inner foreach loop is returned as
# the value of the body of the outer foreach loop
foreach(aElem=aVector, .combine='c') %do% {
roadMapRow <- double(length=rows)
roadMapRow[sample(rows,aElem)] <- 1
sum(roadMapRow)
}
}
results <- as.data.frame(results)
names(results) <- aVector
results
}
The inner loop doesn't need to be implemented as a foreach loop. You could also use sapply
, but I'd try to figure out if there's a faster method. But for this answer, I wanted to demonstrate a foreach method. The only real optimization that I used was to get rid of the call to apply
by executing sum
inside the inner foreach loop.