rmatrixparallel-processingassignmclapply

How to assign row changes to an existing matrix within an 'apply' function from the parallel package


In R one can use the <<- symbol within the lapply() function to assign a value to a variable outside lapply().

Let's consider a matrix full of 1:

m<-matrix(data=1, nrow=5, ncol=5)

Let's say I want to replace each row by the values 1,2,3,4 and 5 using the assignation symbol <<-. I can use the function the lapply function (it is not the designed function for that kind of operation, this is only an example):

lapply(X = seq(nrow(m)), FUN = function(r){
  m[r,]<<-seq(5)
})

This will work.

But if I now use mclapply like this:

mclapply(X = seq(nrow(m)), FUN = function(r){
  m[r,]<<-seq(5)
})

The matrix m will remain full of 1.

The idea is to apply changes to rows of a matrix, without creating a new one, but rather assigning them in the existing one. The only constrain is to use a function from the parallel package (e.g. mclapply(), but maybe another function would better fit).
Also using the <<- symbol is not mandatory.
How can I do that ?


Solution

  • You can't assign in parallel, as you're just assigning to a local copy of the matrix.

    Two solutions:

    1. Use shared memory (e.g. matrices on disk using package {bigstatsr}; disclaimer: I'm the author)

    2. Don't assign in the first place. Just run the lapply(), get all the results parts as a list and use do.call("rbind", list).