I am writing an R package for statistical analysis and machine learning (ML) that is often very slow. It is slow because it involves training and predicting models, both statistical and machine learning. My package is model-agnostic, by which I mean that it interfaces with any other model training and prediction package from R to retrain their models and use their models to make predictions. After extensive profiling and code refactoring (mainly by converting as much as possible to vectorized and matrix operations), I have found that the slow points that I cannot speed up further by refactoring come down to code that:
I would like to know if Rcpp can help speed things up in my situation. Please note what I am not asking here:
My key doubt about whether Rcpp can help is that the slowest code is when I call other packages' R functions. I've been reading up a lot on Rcpp and I'm even taking the DataCamp course on that topic. However, from my current exploration of Rcpp, although many sources explain why we would want to use Rcpp (to speed up slow R code), I have been unable to find any source that clearly spells out what kinds of problems that Rcpp cannot help with.
From what I've gathered, Rcpp cannot provide any speed-up when it calls R functions. The functions that are slowing down my code are those written by other packages. For example, I have an article that demonstrates my package functionality using nnet::nnet()
and nnet::predict.nnet()
to train and predict a neural network, respectively, and gbm::gbm()
and gbm::predict.gbm()
to train and predict a gradient boosted machine, respectively. Is there any way to use Rcpp to optimize the calls to these functions?
If I could call Rcpp::cppFunction()
in real-time to receive these functions, compile them to C++, and then continue to execute them with my program, then that could be a viable solution. But is that even possible with Rcpp? I would appreciate any guidance here. And I am willing to accept a clearly explained answer of, "No, Rcpp cannot help in your case, and here's why."
It appears some aspects are being confused here:
Rcpp::Function
. It is documented that this used to have more prohibitive overhead in very early days (and that is what the previous answer refers to) but got (much !!) better several years ago (see release NEWS or ChangeLog). There is still some overhead, but it is entirely viable and many packages to that for some tasks (including Rcpp itself).cppFunction()
repeatedly, or at each start of a package: when Rcpp code helps, it is trivial to wrap it in a package.Most importantly, it generally pays off to follow a general rule: do not conjecture but rather profile and measure.