I am building an app that frequently computes the regularized incomplete beta function. The app is written in C++ and calls R::pbeta()
. When I tried to multithread the app, some warning messages from R::pbeta()
smashed the stack.
So I turned to boost::math::ibeta()
. Everything worked fine until I measured the speed. The following C++ file whyIsBoostSlower.cpp
implements the regularized incomplete beta function using either R::pbeta()
or boost::math::ibeta()
.
// [[Rcpp::plugins(cpp17)]]
#include <boost/math/special_functions/beta.hpp>
// [[Rcpp::depends(BH)]]
#include <Rcpp.h>
using namespace Rcpp;
// Compute the regularized incomplete Beta function.
// [[Rcpp::export]]
NumericVector RIBF(NumericVector q, NumericVector a, NumericVector b,
bool useboost = false)
{
NumericVector rst(q.size());
for (int i = 0, iend = q.size(); i < iend; ++i)
{
if (useboost) rst[i] = boost::math::ibeta( a[i], b[i], q[i] );
else rst[i] = R::pbeta( q[i], a[i], b[i], 1, 0 );
}
return rst;
}
In R, we measure the speed of calling the function 300000 times on random numbers:
Rcpp::sourceCpp("whyIsBoostSlower.cpp")
set.seed(123)
N = 300000L
q = runif(N) # Generate quantiles.
a = runif(N, 0, 10) # Generate a in (0, 10)
b = runif(N, 0, 10) # Generate b in (0, 10)
# Use R's pbeta(). This function calls a C wrapper of toms708.c:
# https://svn.r-project.org/R/trunk/src/nmath/toms708.c
system.time({ Rrst = RIBF(q, a, b, useboost = F) })
# Windows 10 (seconds):
# user system elapsed
# 0.11 0.00 0.11
# RedHat Linux:
# user system elapsed
# 0.097 0.000 0.097
# Use Boost's implementation, which also depends on TOMS 708 by their claim:
# https://www.boost.org/doc/libs/1_41_0/libs/math/doc/sf_and_dist/html/math_toolkit/special/sf_beta/ibeta_function.html
system.time({ boostRst = RIBF(q, a, b, useboost = T) })
# Windows 10:
# user system elapsed
# 0.52 0.00 0.52
# RedHat Linux:
# user system elapsed
# 0.988 0.001 0.993
range(Rrst - boostRst)
# -1.221245e-15 1.165734e-15
To reproduce the example, one needs to install R and package Rcpp
. On Windows, one also needs to install Rtools
which contains a GCC distribution. The optimization flag is default to -O2
.
Both R::pbeta()
and boost::math::ibeta()
are based on ACM TOMS 708, yet boost::math::ibeta()
is 5x slower on Windows and 10x slower on Linux.
I think it might have something to do with setting the Policy
argument in boost::math::ibeta()
, but how?
Thank you!
FYI, R::pbeta()
is defined in R-4.2.3/src/nmath/pbeta.c
. R::pbeta()
calls bratio()
which is defined in R-4.2.3/src/nmath/toms708.c
, namely https://svn.r-project.org/R/trunk/src/nmath/toms708.c . Code inside is C translation of TOMS 708 Fortran code. The translation is done by R's core team.
In contrast, Boost states "This implementation is closely based upon "Algorithm 708; Significant digit computation of the incomplete beta function ratios", DiDonato and Morris, ACM, 1992." on boost::math::ibeta()
As there are Beta, incomplete Beta as well as a Beta distribution (which you hit with R::pbeta()
) I want to ensure we compare apples with apples.
So a modified version of your code, here with two distinct functions for simplicity---as well as a comparison the GSL---and formal benchmark call:
// [[Rcpp::depends(BH)]]
#include <boost/math/special_functions/beta.hpp>
// this also ensure linking with the GSL
// [[Rcpp::depends(RcppGSL)]]
#include <gsl/gsl_sf_gamma.h>
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector bfR(NumericVector a, NumericVector b) {
int n = a.size();
NumericVector rst(n);
for (int i = 0; i<n; i++) {
rst[i] = R::beta(a[i], b[i]);
}
return rst;
}
// [[Rcpp::export]]
NumericVector bfB(NumericVector a, NumericVector b) {
int n = a.size();
NumericVector rst(n);
for (int i = 0; i<n; i++) {
rst[i] = boost::math::beta( a[i], b[i] );
}
return rst;
}
// [[Rcpp::export]]
NumericVector bfG(NumericVector a, NumericVector b) {
int n = a.size();
NumericVector rst(n);
for (int i = 0; i<n; i++) {
rst[i] = gsl_sf_beta( a[i], b[i] );
}
return rst;
}
/*** R
set.seed(123)
N <- 300000L
a <- runif(N, 0, 10) # Generate a in (0, 10)
b <- runif(N, 0, 10) # Generate b in (0, 10)
summary(bfR(a,b) - bfB(a,b))
summary(bfR(a,b) - bfG(a,b))
microbenchmark::microbenchmark(R = bfR(a, b), Boost = bfB(a, b), GSL = bfG(a, b), times=10)
*/
When we Rcpp::sourceCpp()
the 'marked' R code section also gets executed:
> Rcpp::sourceCpp("answer.cpp")
> set.seed(123)
> N <- 300000L
> a <- runif(N, 0, 10) # Generate a in (0, 10)
> b <- runif(N, 0, 10) # Generate b in (0, 10)
> summary(bfR(a,b) - bfB(a,b))
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.64e-12 0.00e+00 0.00e+00 -5.00e-17 0.00e+00 1.36e-12
> summary(bfR(a,b) - bfG(a,b))
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.18e-11 -4.00e-17 0.00e+00 2.10e-16 0.00e+00 1.09e-11
> microbenchmark::microbenchmark(R = bfR(a, b), Boost = bfB(a, b), GSL = bfG(a, b), times=10)
Unit: milliseconds
expr min lq mean median uq max neval cld
R 44.9314 45.2773 46.9782 46.1237 49.0056 50.6273 10 a
Boost 166.0146 167.2552 171.0441 169.5741 175.0108 180.6520 10 b
GSL 58.3259 58.5101 61.0364 59.6556 62.4862 67.5316 10 c
>
At this point I can only guess that Boost either does some extra hoops, or suffers some costs from abstraction as it looses to both R and the GSL. (And I note that on its documentation page the results are compared (for accurracy) to the GNU GSL as well as to R: https://www.boost.org/doc/libs/1_82_0/libs/math/doc/html/math_toolkit/sf_beta/beta_function.html)