[SOLVED] Directly assign results of doMC (foreach) to data frame

Directly assign results of doMC (foreach) to data frame

Lets say I have the example code

kkk<-data.frame(m.mean=1:1000, m.sd=1:1000/20)
kkk[,3:502]<-NA

for (i in 1:nrow(kkk)){
  kkk[i,3:502]<-rnorm(n=500, mean=kkk[i,1], sd=kkk[i,2])
}

I would like to convert this function to run parallel with doMC. My problem is that foreach results in a list, whereas I need the results of each iteration to be a vector that can be then transfered to the data frame (which later will be exported as CVS for further processing).

Any ideas?

Solution

You don't need a loop for this, and putting a large matrix of numbers in a data frame only to treat is as a matrix is inefficient (although you may need to create a data frame at the end after doing all your math in order to write to a CSV file).

m.mean <- 1:1000
m.sd <- 1:1000/20
num.columns <- 500
x <- matrix(nrow=length(m.mean), ncol=num.columns, 
            data=rnorm(n=length(m.mean) * num.columns))
x <- x * cbind(m.sd)[,rep(1,num.columns)] + cbind(m.mean)[,rep(1,num.columns)]
kkk <- data.frame(m.mean=m.mean, m.sd=m.sd, unname(x))
write.csv(kkk, "kkk.txt")

To answer your original question about directly assigning results to an existing data structure from a foreach loop, that is not possible. The foreach package's parallel backends are designed to perform each computation in a separate R process, so each one has to return a separate object to the parent process, which collects them with the .combine function provided to foreach. You could write a parallel foreach loop that assignes directly to the kkk variable, but it would have no effect, because each assignment would happen in the separate processes and would not be shared with the main process.