c++rrcppmlpack

Problem with Using RcppMLPACK in My Own R Package


I am trying to develop an R package using the kmeans functionality from RcppMLPACK. I in including the header part below:

#include <RcppArmadillo.h>
#include <RcppMLPACK.h>
#include <RcppGSL.h>
#include <RcppDist.h>
#include <sstream>
#include <iostream>
#include <fstream>
#include<omp.h>
#include<gsl/gsl_math.h>
#include<gsl/gsl_rng.h>
#include<gsl/gsl_randist.h>
#include<gsl/gsl_sf.h>

// [[Rcpp::depends(RcppProgress)]]
#include <progress.hpp>
#include <progress_bar.hpp>


// [[Rcpp::depends(RcppArmadillo,RcppDist)]]
// [[Rcpp::depends(RcppMLPACK)]]
// [[Rcpp::depends(RcppGSL)]]
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::plugins(openmp)]]

using namespace mlpack::kmeans ;
using namespace arma;

My Makevars file-body is given below:

CXX_STD = CXX17
GSL_CFLAGS=`${R_HOME}/bin/Rscript -e "RcppGSL:::CFlags()" 4`
GSL_LIBS=`${R_HOME}/bin/Rscript -e "RcppGSL:::LdFlags()"`
RCPP_LDFLAGS=`${R_HOME}/bin/Rscript -e "Rcpp:::LdFlags()"`

PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) $(GSL_CFLAGS)
PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) $(GSL_LIBS) $(RCPP_LDFLAGS) 

I am using macOS ventura. When I try to build my R package, it shows the following error

In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppMLPACK/include/mlpack/core.hpp:171,
                 from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppMLPACK/include/RcppMLPACK.h:4,
                 from RcppExports.cpp:6:
/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppMLPACK/include/mlpack/prereqs.hpp:46:10: fatal error: boost/math/special_functions/gamma.hpp: No such file or directory
  >> 46 | #include <boost/math/special_functions/gamma.hpp>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.

However if I simply Rcpp::sourcecpp on my C++ file, then it compiles perfectly. Kindly help me in debugging the issue.

P.S. I am using gcc instead of clang. Both boost and mlpack are installed in my system.


Solution

  • The topic is a little underdocumented: mlpack is a large package and contains a lot, but there is no 'quick start' from R. At the same time your question may have overcomplicated things by including several kitchen sinks worth of included libraries. I find that adding too much too early muddles things.

    So here is what I did (usng mlpack 3.4.2, see below for mlpack 4.0.1):

    Viability

    I first created a minimal C++ file include just the two headers and not doing much.

    It looked like this, give or take:

    #include <Rcpp/Rcpp>
    #include <mlpack.h>
        
    // [[Rcpp::depends(RcppArmadillo)]]
    // [[Rcpp::depends(mlpack)]]
    
    // [[Rcpp::export]]
    void foo() {
        Rcpp::Rcout << "Foo\n";
    }
    
    /*** R
    foo()
    */
    

    Compiling this means that mlpack is found. I have the CRAN package installed.

    Running kmeans (mlpack 3.4.*, see below for 4.0.1)

    This gets a little more complicated for me as I happen to (mainly) work on Ubuntu 22.10 which only has an older mlpack 3.4.2 as a convenient system library from the distribution. I think that with a newer mlpack release 4.* I would not need to link.

    As I often do I took a simple example from the unit tests. It has data, as well as an invocation. The full file now is as follows:

    #include <Rcpp/Rcpp>
    #include <mlpack.h>
    
    // Two include directories adjusted for my use of mlpack 3.4.2 on Ubuntu
    #include <mlpack/core.hpp>
    #include <mlpack/methods/kmeans/kmeans.hpp>
    #include <mlpack/methods/kmeans/random_partition.hpp>
    #include <mlpack/methods/neighbor_search/neighbor_search.hpp>
    
    // [[Rcpp::depends(RcppArmadillo)]]
    // [[Rcpp::depends(mlpack)]]
    
    // This is 'borrowed' from mlpack's own src/mlpack/tests/kmeans_test.cpp
    // and src/mlpack/tests/kmeans_test.cpp. We borrow the data set, and the
    // code from the first test function. Passing data from R in easy thanks
    // to RcppArmadillo, 'and left as an exercise'.
    
    // Generate dataset; written transposed because it's easier to read.
    arma::mat kMeansData("  0.0   0.0;" // Class 1.
                         "  0.3   0.4;"
                         "  0.1   0.0;"
                         "  0.1   0.3;"
                         " -0.2  -0.2;"
                         " -0.1   0.3;"
                         " -0.4   0.1;"
                         "  0.2  -0.1;"
                         "  0.3   0.0;"
                         " -0.3  -0.3;"
                         "  0.1  -0.1;"
                         "  0.2  -0.3;"
                         " -0.3   0.2;"
                         " 10.0  10.0;" // Class 2.
                         " 10.1   9.9;"
                         "  9.9  10.0;"
                         " 10.2   9.7;"
                         " 10.2   9.8;"
                         "  9.7  10.3;"
                         "  9.9  10.1;"
                         "-10.0   5.0;" // Class 3.
                         " -9.8   5.1;"
                         " -9.9   4.9;"
                         "-10.0   4.9;"
                         "-10.2   5.2;"
                         "-10.1   5.1;"
                         "-10.3   5.3;"
                         "-10.0   4.8;"
                         " -9.6   5.0;"
                         " -9.8   5.1;");
    
    
    // [[Rcpp::export]]
    arma::Row<size_t> kmeansDemo() {
    
        mlpack::kmeans::KMeans<mlpack::metric::EuclideanDistance, 
                               mlpack::kmeans::RandomPartition> kmeans;
    
        arma::Row<size_t> assignments;
        kmeans.Cluster((arma::mat) trans(kMeansData), 3, assignments);
    
        return assignments;
    }
    
    /*** R
    kmeansDemo()
    */
    

    Now, because I am on mlpack 3.4.2 I have to link so I also need to run Sys.setenv("PKG_LIBS"="-lmlpack") -- and I had to adjust the headers slightly from the example I took from the repo where it set up for mlpack 4.1.*.

    The link step will vary depending on where you are running this.

    But with that, my R session produces the result:

    > Sys.setenv("PKG_LIBS"="-lmlpack")
    > Rcpp::sourceCpp("~/git/stackoverflow/76319284/answer.cpp")
    
    > kmeansDemo()
         [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
    [1,]    2    2    2    2    2    2    2    2    2     2     2     2     2     1     1     1     1     1     1     1     0     0     0     0     0     0     0     0     0     0
    > 
    

    Running kmeans (mlpack 4.0.1)

    Things are even better and easier with a (more) current version of mlpack. After I installed 4.0.1 on Ubuntu, the header include simplifiied a little, the namespace changed a litte, I added an R package dependency on RcppEnsmallen (which provides optimization routines). Most importantly, I can build this without linking.

    Updated Code (for mlpack 4.0.1)

    #include <Rcpp/Rcpp>
    #include <mlpack.h>
    
    // Adjusted for mlpack 4.0.1
    #include <mlpack/core.hpp>
    #include <mlpack/methods/kmeans.hpp>
    #include <mlpack/methods/kmeans/random_partition.hpp>
    #include <mlpack/methods/neighbor_search/neighbor_search.hpp>
    
    // [[Rcpp::depends(RcppArmadillo)]]
    // [[Rcpp::depends(RcppEnsmallen)]]
    // [[Rcpp::depends(mlpack)]]
    // [[Rcpp::plugins(cpp14)]]
    
    // This is 'borrowed' from mlpack's own src/mlpack/tests/kmeans_test.cpp
    // and src/mlpack/tests/kmeans_test.cpp. We borrow the data set, and the
    // code from the first test function. Passing data from R in easy thanks
    // to RcppArmadillo, 'and left as an exercise'.
    
    // Generate dataset; written transposed because it's easier to read.
    arma::mat kMeansData("  0.0   0.0;" // Class 1.
                         "  0.3   0.4;"
                         "  0.1   0.0;"
                         "  0.1   0.3;"
                         " -0.2  -0.2;"
                         " -0.1   0.3;"
                         " -0.4   0.1;"
                         "  0.2  -0.1;"
                         "  0.3   0.0;"
                         " -0.3  -0.3;"
                         "  0.1  -0.1;"
                         "  0.2  -0.3;"
                         " -0.3   0.2;"
                         " 10.0  10.0;" // Class 2.
                         " 10.1   9.9;"
                         "  9.9  10.0;"
                         " 10.2   9.7;"
                         " 10.2   9.8;"
                         "  9.7  10.3;"
                         "  9.9  10.1;"
                         "-10.0   5.0;" // Class 3.
                         " -9.8   5.1;"
                         " -9.9   4.9;"
                         "-10.0   4.9;"
                         "-10.2   5.2;"
                         "-10.1   5.1;"
                         "-10.3   5.3;"
                         "-10.0   4.8;"
                         " -9.6   5.0;"
                         " -9.8   5.1;");
    
    
    // [[Rcpp::export]]
    arma::Row<size_t> kmeansDemo() {
    
        mlpack::KMeans<mlpack::EuclideanDistance, mlpack::RandomPartition> kmeans;
    
        arma::Row<size_t> assignments;
        kmeans.Cluster((arma::mat) trans(kMeansData), 3, assignments);
    
        return assignments;
    }
    
    /*** R
    kmeansDemo()
    */
    

    It of course builds and runs the same and now includes a small amount of default logging:

    > Rcpp::sourceCpp("answer.cpp")
    
    > kmeansDemo()
    [INFO ] KMeans::Cluster(): iteration 1, residual 14.8221.
    [INFO ] KMeans::Cluster(): iteration 2, residual 1.77636e-15.
    [INFO ] KMeans::Cluster(): converged after 2 iterations.
    [INFO ] 186 distance calculations.
         [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
    [1,]    2    2    2    2    2    2    2    2    2     2     2     2     2     0     0     0     0     0     0     0     1     1     1     1     1     1     1     1     1     1
    >