c++iteratorsparse-matrixconst-iterator

Writing a C++ iterator for a sparse matrix class


I'm attempting to get a basic constant forward-iterator to work in C++.

namespace Rcpp {
    class SparseMatrix {
    public:
        IntegerVector i, p;
        NumericVector x;
   
        int begin_col(int j) { return p[j]; };
        int end_col(int j) { return p[j + 1]; };
        
        class iterator {
        public:
            int index;
            iterator(SparseMatrix& g) : parent(g) {}
            iterator(int ind) { index = ind; };                       // ERROR!
            bool operator!=(int x) const { return index != x; };
            iterator operator++(int) { ++index; return (*this); };
            int row() { return parent.i[index]; };
            double value() { return parent.x[index]; };
        private:
            SparseMatrix& parent;
        };
    };    
}

My intention is to use the iterator in contexts similar to the following:

// sum of values in column 7
Rcpp::SparseMatrix A(nrow, ncol, fill::random);
double sum = 0;
for(Rcpp::SparseMatrix::iterator it = A.begin_col(7); it != A.end_col(7); it++)
    sum += it.value();

Two questions:

  1. The compiler throws an error on the line indicated above: uninitialized reference member in 'class Rcpp::SparseMatrix&' [-fpermissive]. How can this be fixed?
  2. How might double value() { return parent.x[index]; }; be re-worked to return a pointer to the value rather than a copy of the value?

A little context on the SparseMatrix class: like a dgCMatrix in R, this object of class SparseMatrix consists of three vectors:


Solution

  • Thanks to @Evg, here's the solution:

    namespace Rcpp {
        class SparseMatrix {
        public:
            IntegerVector i, p;
            NumericVector x;
       
            class iterator {
            public:
                int index;
                iterator(SparseMatrix& g, int ind) : parent(g) { index = ind; }
                bool operator!=(iterator x) const { return index != x.index; };
                iterator& operator++() { ++index; return (*this); };
                int row() { return parent.i[index]; };
                double& value() { return parent.x[index]; };
            private:
                SparseMatrix& parent;
            };
    
            iterator begin_col(int j) { return iterator(*this, p[j]); };
            iterator end_col(int j) { return iterator(*this, p[j + 1]); };
        };    
    }
    

    And it can be used as follows, for instance, to calculate colSums:

    //[[Rcpp::export]]
    Rcpp::NumericVector Rcpp_colSums(Rcpp::SparseMatrix& A) {
        Rcpp::NumericVector sums(A.cols());
        for (int i = 0; i < A.cols(); ++i)
            for (Rcpp::SparseMatrix::iterator it = A.begin_col(i); it != A.end_col(i); it++)
                sums(i) += it.value();
        return sums;
    }
    

    And, the above function is faster than RcppArmadillo, RcppEigen, and R::Matrix equivalents when microbenchmarked from R!

    Edit:

    The above syntax is inspired by Armadillo. I've come to realize that a slightly different syntax (which involves fewer constructions) gives an iterator similar to Eigen:

    class col_iterator {
        public:
          col_iterator(SparseMatrix& ptr, int col) : ptr(ptr) { indx = ptr.p[col]; max_index = ptr.p[col + 1]; }
          operator bool() const { return (indx != max_index); }
          col_iterator& operator++() { ++indx; return *this; }
          const double& value() const { return ptr.x[indx]; }
          int row() const { return ptr.i[indx]; }
        private:
          SparseMatrix& ptr;
          int indx, max_index;
        };
    

    Which can then be used like this:

    int col = 0;
    for (Rcpp::SparseMatrix::col_iterator it(A, col); it; ++it)
         Rprintf("row: %3d, value: %10.2e", it.row(), it.value());