rgpudoparallelmclapplydomc

assembling a matrix from diagonal slices with mclapply or %dopar%, like Matrix::bandSparse


Right now I'm working with some huge matrices in R and I need to be able to reassemble them using diagonal bands. For programming reasons (to avoid having to do n*n operations for a matrix of size n (millions of calculations), I wanted to just do 2n calculations (thousands of calculations) and thus chose to do run my function on the diagonal bands of the matrix. Now, I have the results, but need to take these matrix slices and assemble them in a way that allows me to use multiple processors.

Both foreach and mclapply won't let me modify objects outside of loops,so I'm trying to think of a parallel solution. If there was some function to assign an off-diagonal band to a part of a matrix that could be reliably done, I'm all for it.

input:

[1] 0.3503037

[1] 0.2851895 0.2851895

[1] 0.5233396 0.5233396 0.5233396

[1] 0.6250584 0.6250584 0.6250584 0.6250584

[1] 0.4300964 0.4300964 0.4300964 0.4300964 0.4300964

[1] 0.4300964 0.4300964 0.4300964 0.4300964 0.4300964

[1] 0.3949782 0.3949782 0.3949782 0.3949782

[1] 0.7852812 0.7852812 0.7852812

[1] 0.5309648 0.5309648

[1] 0.7718504

desired output (with parallel operations):

          [,1]      [,2]      [,3]      [,4]      [,5]
[1,] 0.4300964 0.6250584 0.5233396 0.2851895 0.3503037

[2,] 0.3949782 0.4300964 0.6250584 0.5233396 0.2851895

[3,] 0.7852812 0.3949782 0.4300964 0.6250584 0.5233396

[4,] 0.5309648 0.7852812 0.3949782 0.4300964 0.6250584

[5,] 0.7718504 0.5309648 0.7852812 0.3949782 0.4300964

The more I look at this, I need a version of Matrix::bandSparse that is parallelized.


Solution

  • If you want to build a single matrix you are looking for shared memory parallelism. Both parallel and foreach implement distributed memory parallelism. I know of one R package that implements shared memory (Rdsm), but I have not used it. The more natural way to get shared memory parallelism is by using C++.

    I have implemented the bands to matrix conversion in R (serial), C++ with Rcpp (serial) and C++ plus OpenMP with Rcpp and RcppParallel (parallel). Note that the input I used was a list of vectors without duplicated diagonal. For the OpenMP solution I converted this to a (ragged) matrix since this allows for easy conversion to a thread safe RMatrix. This conversion is very fast even in R:

    #include <Rcpp.h>
    #include <algorithm>
    using namespace Rcpp;
    
    // [[Rcpp::export]]
    NumericMatrix diags2mtrCpp(int n, const ListOf<const NumericVector>& diags) {
      NumericMatrix mtr(n, n);
      int nDiags = diags.size();
      for (int i = 0; i < nDiags; ++i) {
        NumericVector diag(diags[i]);
        int nDiag = diag.size();
        int row = std::max(1, i - n + 2);
        int col = std::max(1, n - i);
        for (int j = 0; j < nDiag; ++j) {
          mtr(row + j - 1, col + j - 1) = diag(j);
        }
      }
      return mtr;
    }
    
    // [[Rcpp::plugins(openmp)]]
    #include <omp.h>
    // [[Rcpp::depends(RcppParallel)]]
    #include <RcppParallel.h>
    using namespace RcppParallel;
    
    // [[Rcpp::export]]
    NumericMatrix diags2mtrOmp(const NumericMatrix& diags_matrix, const IntegerVector& diags_length) {
      int nDiags = diags_matrix.cols();
      int n = diags_matrix.rows();
      NumericMatrix res(n, n);
      RMatrix<double> mtr(res);
      RMatrix<double> diags(diags_matrix);
      RVector<int> diagSize(diags_length);
      #pragma omp parallel for
      for (int i = 0; i < nDiags; ++i) {
        int nDiag = diagSize[i];
        int row = std::max(1, i - n + 2);
        int col = std::max(1, n - i);
        for (int j = 0; j < nDiag; ++j) {
          mtr(row + j - 1, col + j - 1) = diags(j, i);
        }
      }
      return res;
    }
    
    
    /*** R
    set.seed(42)
    n <- 2^12
    n
    diags <- vector(mode = "list", length = 2 * n - 1)
    for (i in seq_len(n)) {
      diags[[i]] <- rep.int(runif(1), i)
      diags[[2 * n - i]] <- rep.int(runif(1), i)
    }
    
    diags_matrix <- matrix(0, nrow = n, ncol = length(diags))
    diags_length <- integer(length(diags))
    for (i in seq_along(diags)) {
      diags_length[i] <- length(diags[[i]])
      diags_matrix[ ,i] <- c(diags[[i]], rep.int(0, n - diags_length[i]))
    }
    
    
    diags2mtr <- function(n, diags) {
      mtr <- matrix(0, n, n)
      for (i in seq_along(diags)) {
        row <- max(1, i - n + 1)
        col <- max(1, n + 1 - i)
        for (j in seq_along(diags[[i]]))
          mtr[row + j - 1 , col + j - 1] <- diags[[i]][j]
      }
      mtr
    
    }
    system.time(mtr <- diags2mtr(n, diags))
    system.time(mtrCpp <- diags2mtrCpp(n, diags))
    system.time(mtrOmp <- diags2mtrOmp(diags_matrix, diags_length))
    all.equal(mtr, mtrCpp)
    all.equal(mtr, mtrOmp)
    */
    

    Benchmarking these solutions on a dual core machine give me:

    Unit: milliseconds
             expr        min        lq      mean    median        uq       max neval
        diags2mtr 2252.82538 2271.7221 2354.1251 2323.8221 2382.7958 2558.9282    10
     diags2mtrCpp  161.25920  190.9728  224.9094  226.2652  265.3675  279.3848    10
     diags2mtrOmp   95.50714  100.9555  105.8462  102.4064  105.7645  127.5200    10
    

    I am surprised about the speedup from diags2mtrOmp.