c++rrcpps-expression

Extending Rcpp::as for custom classes depending on Rcpp.h


I'm working on an Rcpp sparse matrix class that uses both Rcpp::IntegerVector (row/column pointers) and a templated std::vector<T>. The rationale is that overhead in deep-copying the integer pointer vectors (@i, @p) in extremely large sparse-matrices can be avoided by simply leaving them as pointers to R objects, and consistently, microbenchmarks show that this approach takes almost exactly half the time as conversion to Eigen::SparseMatrix<T> and arma::SpMat<T> while using less memory.

Bare-bones Rcpp sparse matrix class

namespace SpRcpp {

    template<typename T>
    class SpMatrix {

    public:
        Rcpp::IntegerVector i, p;
        std::vector<T> x;
        unsigned int n_row, n_col;

        // constructor for the class from an Rcpp::S4 object
        SpMatrix(Rcpp::S4& mat) {
            Rcpp::IntegerVector dims = mat.slot("Dim");
            n_row = (unsigned int)dims[0];
            i = mat.slot("i");
            p = mat.slot("p");
            x = Rcpp::as<std::vector<T>>(mat.slot("x"));
        };
        // other constructors, class methods, iterators, etc.
    };
}

Example usage:

//[[Rcpp::export]]
std::vector<float> SpRcpp_SpMatrix(Rcpp::S4& mat) {
    SpRcpp::SpMatrix<float> A(mat);
    return A.x;
}

This works!

However, I'd like to have Rcpp implicitly convert an S4 dgCMatrix object, for example, to an SpRcpp::SpMatrix object to enable functions like the following:

Implicit Rcpp wrapping

//[[Rcpp::export]]
std::vector<float> SpRcpp_SpMatrix2(SpRcpp::SpMatrix<float>& mat) {
    return mat.x;
}

Such is a case of using Rcpp::as.

I've tried the following:

namespace Rcpp {
    namespace traits {
        template <typename T>
        class Exporter< SpRcpp::SpMatrix<T> > {
        public:
            Exporter(SEXP x) { Rcpp::S4 mat = x; }
            SpRcpp::SpMatrix<T> get() {
                return SpRcpp::SpMatrix<T>(mat);
            }
        private: Rcpp::S4 mat;
        };
    }
}

This compiles, and I know that the SpRcpp::SpMatrix<T>(Rcpp::S4& x) constructor works, but when I attempt to feed a dgCMatrix into SpRcpp_SpMatrix() I get the error:

Error in SpRcpp_SpMatrix2(A) : Not an S4 object.

I presume this is because I've declared the following before all my class declaration:

#include <RcppCommon.h>
#include <Rcpp.h>

Per the docs here, the RcppGallery example, and the RcppArmadillo implementation, #include <Rcpp.h> must not precede the Rcpp::as and Rcpp::wrap functions, but in my case I can't do that because my class definition requires Rcpp.h.

QUESTION: How do I create an Rcpp::as for SpRcpp::SpMatrix from an Rcpp::S4 dgCMatrix when the SpMatrix class depends on Rcpp.h?


Solution

  • It's actually quite simple to create an Rcpp SparseMatrix class! I was overthinking it.

    #include <rcpp.h>
    
    // Rcpp for sparse matrices (spRcpp)
    namespace Rcpp {
        class SparseMatrix {
        public:
            Rcpp::IntegerVector i, p;
            Rcpp::NumericVector x;
            int n_rows, n_cols;
    
            // constructor
            SparseMatrix(Rcpp::S4 mat) {
                Rcpp::IntegerVector dim = mat.slot("Dim");
                i = mat.slot("i");
                p = mat.slot("p");
                x = mat.slot("x");
                n_rows = (int)dim[0];
                n_cols = (int)dim[1];
            };
        };
    }
    
    namespace Rcpp {
        template <> Rcpp::SparseMatrix as(SEXP mat) {
            return Rcpp::SparseMatrix(mat);
        }
    }
    
    //[[Rcpp::export]]
    Rcpp::NumericVector toRcppSparseMatrix(Rcpp::SparseMatrix& A) {
        return A.x;
    }
    

    Given a Matrix::dgCMatrix, mat, calling toRcppSparseMatrix(mat) returns non-zero values in 1-2 microseconds for 25 million values. This is in contrast to RcppArmadillo or RcppEigen sparse matrix conversions which take about 250 milliseconds for this same matrix and runs a deep copy in-memory.

    As Dirk suggested, using RcppArmadillo ivec and dvec were very efficient, but still created shallow copies which resulted in ~100 milliseconds runtime and consumed some memory.

    Obviously, the above approach is restricted to double types, so float operations are not possible without a deep copy.