c++reigenrcpprcppeigen

How to multiply two matrices that are sparse Matrix::csr/csc format?


The following code works as expected:

matrix.cpp

// [[Rcpp::depends(RcppEigen)]]

#include <RcppEigen.h>

// [[Rcpp::export]]
SEXP eigenMatTrans(Eigen::MatrixXd A){
    Eigen::MatrixXd C = A.transpose();

    return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMatMult(Eigen::MatrixXd A, Eigen::MatrixXd B){
    Eigen::MatrixXd C = A * B;

    return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMapMatMult(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
    Eigen::MatrixXd C = A * B;

    return Rcpp::wrap(C);
}

This is using the C++ eigen class for matrices, See https://eigen.tuxfamily.org/dox

In R, I can access those functions.

library(Rcpp);
Rcpp::sourceCpp('matrix.cpp');  

A <- matrix(rnorm(10000), 100, 100);
B <- matrix(rnorm(10000), 100, 100);
library(microbenchmark);

microbenchmark(eigenMatTrans(A), t(A), A%*%B, eigenMatMult(A, B), eigenMapMatMult(A, B))

This shows that R performs pretty well on resorting (transpose). Multiplying has some advantages with eigen.

Using the Matrix library, I can convert a normal matrix to a sparse matrix.

Example from https://cmdlinetips.com/2019/05/introduction-to-sparse-matrices-in-r/

library(Matrix);
data<- rnorm(1e6)
zero_index <- sample(1e6)[1:9e5]
data[zero_index] <- 0
A = matrix(data, ncol=1000)

A.csr = as(A, "dgRMatrix");
B.csr = t(A.csr);

A.csc = as(A, "dgCMatrix");
B.csc = t(A.csc);

So if I wanted to multiply A.csr times B.csr using eigen, how to do that in C++? I do not want to have to convert types if I don't have to. It is a memory size thing.

The A.csr %*% B.csr is not-yet-implemented. The A.csc %*% B.csc is working.

I would like to microbenchmark the different options, and see how matrix size will be most efficient. In the end, I will have a matrix that is about 1% sparse and have 5 million rows and cols ...


Solution

  • There's a reason that dgRMatrix crossproduct functions are not yet implemented, in fact, they should not be implemented because otherwise they would enable bad practice.

    There are a few performance considerations when working with sparse matrices:

    There may be applications where row-major ordering shines (i.e. see the work by Dmitry Selivanov on CSR matrices and irlba svd), but this is absolutely not one of them, in fact, so much so you are better off doing in-place conversion to get to a CSC matrix.

    tl;dr: column-wise cross-product in row-major matrices is the ultimatum of inefficiency.