rnmf

Fast NMF in R on sparse matrices


I'm looking for a fast NMF implementation for sparse matrices in R.

The R NMF package consists of a number of algorithms, none of which impress in terms of computational time.

NNLM::nnmf() seems state of the art in R at the moment, specifically the method = "scd" and loss = "mse", implemented as alternating least squares solved by sequential coordinate descent. However, this method is quite slow on very large, very sparse matrices.

The rsparse::WRMF function is extremely fast, but that's due to the fact that only positive values in A are used for row-wise computation of W and H.

Is there any reasonable implementation for solving NMF on a sparse matrix?

Is there an equivalent to scikit-learn in R? See this question

There are various worker functions, such as fnnls, tsnnls in R, none of which surpass nnls::nnls (written in Fortran). I have been unable to code any of these functions into a faster NMF framework.


Solution

  • Forgot I even posted this question, but one year later...

    I wrote a very fast implementation of NMF in RcppEigen, see the RcppML R package on CRAN.

    install.packages("RcppML")
    
    # for the development version
    devtools::install_github("zdebruine/RcppML")
    
    ?RcppML::nmf
    

    It's at least an order of magnitude faster than NNLM::nnmf and for comparison, RcppML::nmf rivals the runtime of irlba::irlba SVD (although it's an altogether different algorithm).

    I've successfully applied my implementation to 1.3 million single-cells containing 26000 genes in a 96% sparse matrix for rank-100 factorization in 1 minute. I think that's very reasonable.