c++rcppr-bigmemory

how to modify values of a file-backed matrix in bigmemory rcpp


I am using R bigmemory package and Rcpp to handle big matrices (1 to 10 Million column x 1000 rows). Once I read an interger matrix consisting in 0, 2 and NA into a filebacked bigmemory matrix in R I would like to modify through C++ all the NA values in order to do imputation of the mean values per column or an arbitrary-value-imputation (I show here the latter).

Below is the Rcpp function I have written and that does not work. My hope was that calling BigNA(mybigmatrix@address) from within R could find the elements in the matrix that are NAs and modify its values directly in the backing file.

I think the problem might be in the evaluation of std::isnan(mat[j][i]). I checked this by creating an alternative function that counts the NA values with an accumulator and indeed did not count any NA. But once this is solved, I am also not sure if the expression mat[j][i] = 1 would modify the value in the backing file. Writing those statements feels intuitive for me having an R background but might be wrong.

Any help/suggestion would be very much appreciated.

#include <stdio.h>
#include <Rcpp.h>
#include <bigmemory/MatrixAccessor.hpp>
#include <numeric>
// [[Rcpp::depends(BH, bigmemory)]]
// [[Rcpp::depends(Rcpp)]]


// [[Rcpp::export]]
void BigNA(SEXP pBigMat) {
  /*
  * Imputation of "NA" values for "1" in a big 0, 2 NA matrix.
  */

  // Create the external bigmatrix pointer and iniciate matrix accessor
  XPtr<BigMatrix> xpMat(pBigMat);
  MatrixAccessor<int> mat = (*xpMat);

  // Iterater over the elements in a matrix and when NA is found, substitute for "1"
  for(int i=0; i< xpMat->ncol(); i++){
    for(int j=0; j< xpMat->nrow(); j++){
      if(std::isnan(mat[j][i])){ 
        mat[j][i] = 1;
      }
    }
  }
} 

Solution

  • The problem stems from the difference between NA in R and NAN in C++.

    MatrixAccessor<int> gives you an accessor for values of type int. Any number in R can be NA, but an int in C++ is never NAN. An optimizing compiler could completely ignore std::isnan(x) where x is of type int, as in your case.

    To fix this, you could either:

    Related: Extracting a column with NA's from a bigmemory object in Rcpp