I'm working on an Rcpp sparse matrix class that uses both Rcpp::IntegerVector
(row/column pointers) and a templated std::vector<T>
. The rationale is that overhead in deep-copying the integer pointer vectors (@i
, @p
) in extremely large sparse-matrices can be avoided by simply leaving them as pointers to R objects, and consistently, microbenchmarks show that this approach takes almost exactly half the time as conversion to Eigen::SparseMatrix<T>
and arma::SpMat<T>
while using less memory.
Bare-bones Rcpp sparse matrix class
namespace SpRcpp {
template<typename T>
class SpMatrix {
public:
Rcpp::IntegerVector i, p;
std::vector<T> x;
unsigned int n_row, n_col;
// constructor for the class from an Rcpp::S4 object
SpMatrix(Rcpp::S4& mat) {
Rcpp::IntegerVector dims = mat.slot("Dim");
n_row = (unsigned int)dims[0];
i = mat.slot("i");
p = mat.slot("p");
x = Rcpp::as<std::vector<T>>(mat.slot("x"));
};
// other constructors, class methods, iterators, etc.
};
}
Example usage:
//[[Rcpp::export]]
std::vector<float> SpRcpp_SpMatrix(Rcpp::S4& mat) {
SpRcpp::SpMatrix<float> A(mat);
return A.x;
}
This works!
However, I'd like to have Rcpp implicitly convert an S4 dgCMatrix object, for example, to an SpRcpp::SpMatrix
object to enable functions like the following:
Implicit Rcpp wrapping
//[[Rcpp::export]]
std::vector<float> SpRcpp_SpMatrix2(SpRcpp::SpMatrix<float>& mat) {
return mat.x;
}
Such is a case of using Rcpp::as
.
I've tried the following:
namespace Rcpp {
namespace traits {
template <typename T>
class Exporter< SpRcpp::SpMatrix<T> > {
public:
Exporter(SEXP x) { Rcpp::S4 mat = x; }
SpRcpp::SpMatrix<T> get() {
return SpRcpp::SpMatrix<T>(mat);
}
private: Rcpp::S4 mat;
};
}
}
This compiles, and I know that the SpRcpp::SpMatrix<T>(Rcpp::S4& x)
constructor works, but when I attempt to feed a dgCMatrix into SpRcpp_SpMatrix()
I get the error:
Error in SpRcpp_SpMatrix2(A) : Not an S4 object.
I presume this is because I've declared the following before all my class declaration:
#include <RcppCommon.h>
#include <Rcpp.h>
Per the docs here, the RcppGallery example, and the RcppArmadillo implementation, #include <Rcpp.h>
must not precede the Rcpp::as
and Rcpp::wrap
functions, but in my case I can't do that because my class definition requires Rcpp.h
.
QUESTION: How do I create an Rcpp::as
for SpRcpp::SpMatrix
from an Rcpp::S4
dgCMatrix when the SpMatrix
class depends on Rcpp.h
?
It's actually quite simple to create an Rcpp SparseMatrix class! I was overthinking it.
#include <rcpp.h>
// Rcpp for sparse matrices (spRcpp)
namespace Rcpp {
class SparseMatrix {
public:
Rcpp::IntegerVector i, p;
Rcpp::NumericVector x;
int n_rows, n_cols;
// constructor
SparseMatrix(Rcpp::S4 mat) {
Rcpp::IntegerVector dim = mat.slot("Dim");
i = mat.slot("i");
p = mat.slot("p");
x = mat.slot("x");
n_rows = (int)dim[0];
n_cols = (int)dim[1];
};
};
}
namespace Rcpp {
template <> Rcpp::SparseMatrix as(SEXP mat) {
return Rcpp::SparseMatrix(mat);
}
}
//[[Rcpp::export]]
Rcpp::NumericVector toRcppSparseMatrix(Rcpp::SparseMatrix& A) {
return A.x;
}
Given a Matrix::dgCMatrix
, mat
, calling toRcppSparseMatrix(mat)
returns non-zero values in 1-2 microseconds for 25 million values. This is in contrast to RcppArmadillo or RcppEigen sparse matrix conversions which take about 250 milliseconds for this same matrix and runs a deep copy in-memory.
As Dirk suggested, using RcppArmadillo ivec
and dvec
were very efficient, but still created shallow copies which resulted in ~100 milliseconds runtime and consumed some memory.
Obviously, the above approach is restricted to double types, so float operations are not possible without a deep copy.