I am trying to develop an R
package using the kmeans
functionality from RcppMLPACK
.
I in including the header part below:
#include <RcppArmadillo.h>
#include <RcppMLPACK.h>
#include <RcppGSL.h>
#include <RcppDist.h>
#include <sstream>
#include <iostream>
#include <fstream>
#include<omp.h>
#include<gsl/gsl_math.h>
#include<gsl/gsl_rng.h>
#include<gsl/gsl_randist.h>
#include<gsl/gsl_sf.h>
// [[Rcpp::depends(RcppProgress)]]
#include <progress.hpp>
#include <progress_bar.hpp>
// [[Rcpp::depends(RcppArmadillo,RcppDist)]]
// [[Rcpp::depends(RcppMLPACK)]]
// [[Rcpp::depends(RcppGSL)]]
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::plugins(openmp)]]
using namespace mlpack::kmeans ;
using namespace arma;
My Makevars
file-body is given below:
CXX_STD = CXX17
GSL_CFLAGS=`${R_HOME}/bin/Rscript -e "RcppGSL:::CFlags()" 4`
GSL_LIBS=`${R_HOME}/bin/Rscript -e "RcppGSL:::LdFlags()"`
RCPP_LDFLAGS=`${R_HOME}/bin/Rscript -e "Rcpp:::LdFlags()"`
PKG_CXXFLAGS = $(SHLIB_OPENMP_CXXFLAGS) $(GSL_CFLAGS)
PKG_LIBS = $(SHLIB_OPENMP_CXXFLAGS) $(LAPACK_LIBS) $(BLAS_LIBS) $(FLIBS) $(GSL_LIBS) $(RCPP_LDFLAGS)
I am using macOS ventura. When I try to build my R
package, it shows the following error
In file included from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppMLPACK/include/mlpack/core.hpp:171,
from /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppMLPACK/include/RcppMLPACK.h:4,
from RcppExports.cpp:6:
/Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/RcppMLPACK/include/mlpack/prereqs.hpp:46:10: fatal error: boost/math/special_functions/gamma.hpp: No such file or directory
>> 46 | #include <boost/math/special_functions/gamma.hpp>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
However if I simply Rcpp::sourcecpp
on my C++
file, then it compiles perfectly. Kindly help me in debugging the issue.
P.S. I am using gcc
instead of clang
. Both boost
and mlpack
are installed in my system.
The topic is a little underdocumented: mlpack
is a large package and contains a lot, but there is no 'quick start' from R. At the same time your question may have overcomplicated things by including several kitchen sinks worth of included libraries. I find that adding too much too early muddles things.
So here is what I did (usng mlpack 3.4.2, see below for mlpack 4.0.1):
I first created a minimal C++ file include just the two headers and not doing much.
It looked like this, give or take:
#include <Rcpp/Rcpp>
#include <mlpack.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(mlpack)]]
// [[Rcpp::export]]
void foo() {
Rcpp::Rcout << "Foo\n";
}
/*** R
foo()
*/
Compiling this means that mlpack
is found. I have the CRAN package installed.
This gets a little more complicated for me as I happen to (mainly) work on Ubuntu 22.10 which only has an older mlpack 3.4.2
as a convenient system library from the distribution. I think that with a newer mlpack
release 4.* I would not need to link.
As I often do I took a simple example from the unit tests. It has data, as well as an invocation. The full file now is as follows:
#include <Rcpp/Rcpp>
#include <mlpack.h>
// Two include directories adjusted for my use of mlpack 3.4.2 on Ubuntu
#include <mlpack/core.hpp>
#include <mlpack/methods/kmeans/kmeans.hpp>
#include <mlpack/methods/kmeans/random_partition.hpp>
#include <mlpack/methods/neighbor_search/neighbor_search.hpp>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(mlpack)]]
// This is 'borrowed' from mlpack's own src/mlpack/tests/kmeans_test.cpp
// and src/mlpack/tests/kmeans_test.cpp. We borrow the data set, and the
// code from the first test function. Passing data from R in easy thanks
// to RcppArmadillo, 'and left as an exercise'.
// Generate dataset; written transposed because it's easier to read.
arma::mat kMeansData(" 0.0 0.0;" // Class 1.
" 0.3 0.4;"
" 0.1 0.0;"
" 0.1 0.3;"
" -0.2 -0.2;"
" -0.1 0.3;"
" -0.4 0.1;"
" 0.2 -0.1;"
" 0.3 0.0;"
" -0.3 -0.3;"
" 0.1 -0.1;"
" 0.2 -0.3;"
" -0.3 0.2;"
" 10.0 10.0;" // Class 2.
" 10.1 9.9;"
" 9.9 10.0;"
" 10.2 9.7;"
" 10.2 9.8;"
" 9.7 10.3;"
" 9.9 10.1;"
"-10.0 5.0;" // Class 3.
" -9.8 5.1;"
" -9.9 4.9;"
"-10.0 4.9;"
"-10.2 5.2;"
"-10.1 5.1;"
"-10.3 5.3;"
"-10.0 4.8;"
" -9.6 5.0;"
" -9.8 5.1;");
// [[Rcpp::export]]
arma::Row<size_t> kmeansDemo() {
mlpack::kmeans::KMeans<mlpack::metric::EuclideanDistance,
mlpack::kmeans::RandomPartition> kmeans;
arma::Row<size_t> assignments;
kmeans.Cluster((arma::mat) trans(kMeansData), 3, assignments);
return assignments;
}
/*** R
kmeansDemo()
*/
Now, because I am on mlpack 3.4.2 I have to link so I also need to run Sys.setenv("PKG_LIBS"="-lmlpack")
-- and I had to adjust the headers slightly from the example I took from the repo where it set up for mlpack 4.1.*.
The link step will vary depending on where you are running this.
But with that, my R session produces the result:
> Sys.setenv("PKG_LIBS"="-lmlpack")
> Rcpp::sourceCpp("~/git/stackoverflow/76319284/answer.cpp")
> kmeansDemo()
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
[1,] 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
>
Things are even better and easier with a (more) current version of mlpack
. After I installed 4.0.1 on Ubuntu, the header include simplifiied a little, the namespace changed a litte, I added an R package dependency on RcppEnsmallen (which provides optimization routines). Most importantly, I can build this without linking.
#include <Rcpp/Rcpp>
#include <mlpack.h>
// Adjusted for mlpack 4.0.1
#include <mlpack/core.hpp>
#include <mlpack/methods/kmeans.hpp>
#include <mlpack/methods/kmeans/random_partition.hpp>
#include <mlpack/methods/neighbor_search/neighbor_search.hpp>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::depends(RcppEnsmallen)]]
// [[Rcpp::depends(mlpack)]]
// [[Rcpp::plugins(cpp14)]]
// This is 'borrowed' from mlpack's own src/mlpack/tests/kmeans_test.cpp
// and src/mlpack/tests/kmeans_test.cpp. We borrow the data set, and the
// code from the first test function. Passing data from R in easy thanks
// to RcppArmadillo, 'and left as an exercise'.
// Generate dataset; written transposed because it's easier to read.
arma::mat kMeansData(" 0.0 0.0;" // Class 1.
" 0.3 0.4;"
" 0.1 0.0;"
" 0.1 0.3;"
" -0.2 -0.2;"
" -0.1 0.3;"
" -0.4 0.1;"
" 0.2 -0.1;"
" 0.3 0.0;"
" -0.3 -0.3;"
" 0.1 -0.1;"
" 0.2 -0.3;"
" -0.3 0.2;"
" 10.0 10.0;" // Class 2.
" 10.1 9.9;"
" 9.9 10.0;"
" 10.2 9.7;"
" 10.2 9.8;"
" 9.7 10.3;"
" 9.9 10.1;"
"-10.0 5.0;" // Class 3.
" -9.8 5.1;"
" -9.9 4.9;"
"-10.0 4.9;"
"-10.2 5.2;"
"-10.1 5.1;"
"-10.3 5.3;"
"-10.0 4.8;"
" -9.6 5.0;"
" -9.8 5.1;");
// [[Rcpp::export]]
arma::Row<size_t> kmeansDemo() {
mlpack::KMeans<mlpack::EuclideanDistance, mlpack::RandomPartition> kmeans;
arma::Row<size_t> assignments;
kmeans.Cluster((arma::mat) trans(kMeansData), 3, assignments);
return assignments;
}
/*** R
kmeansDemo()
*/
It of course builds and runs the same and now includes a small amount of default logging:
> Rcpp::sourceCpp("answer.cpp")
> kmeansDemo()
[INFO ] KMeans::Cluster(): iteration 1, residual 14.8221.
[INFO ] KMeans::Cluster(): iteration 2, residual 1.77636e-15.
[INFO ] KMeans::Cluster(): converged after 2 iterations.
[INFO ] 186 distance calculations.
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
[1,] 2 2 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
>