Suppose I have the following list of vector
List <- list(c(1:3), c(4:6), c(7:9))
To get the required result I have the following code in Rcpp
totalCpp <- {"#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
List t_list(List r_list) {
List results;
for (int i = 0; i < r_list.size(); ++i) {
NumericVector vec = as<NumericVector>(r_list[i]);
int sum = 0;
for (int j = 0; j < vec.size(); ++j) {
sum += vec[j];
}
results.push_back(sum); // Add the sum to the results list
}
return results;
}
"}
sourceCpp(code = totalCpp)
which returns the following
> t_list(List)
[[1]]
[1] 6
[[2]]
[1] 15
[[3]]
[1] 24
Is it possible to write this Rcpp code without using two for loops or is there any elegant way to write this code in the Rcpp?
{Rcpp}
has a built in sum
:
library(inline)
builtin_sum <- cxxfunction(
signature(r_list = "list"),
body = '
List input_list(r_list);
List results;
for (int i = 0; i < input_list.size(); ++i) {
NumericVector vec = as<NumericVector>(input_list[i]);
double vec_sum = sum(vec);
results.push_back(vec_sum);
}
return results;
',
plugin = "Rcpp")
This is besides the fact that lapply()
works here:
lapply(List, sum)
Then if we want to be more elegant and actually gain some performance, we can pre-allocate the results vector and use direct assignment, instead of push_back
.
improved_sum <- cxxfunction(
signature(r_list = "list"),
body = '
List input_list(r_list);
int n = input_list.size();
NumericVector results(n); // Pre-allocate numeric vector
for (int i = 0; i < n; ++i) {
NumericVector vec = input_list[i];
results[i] = sum(vec); // Direct assignment, no push_back
}
return results;
',
plugin = "Rcpp")
Here's a benchmark:
set.seed(42)
large_list <- replicate(10000, sample(1:100, 50), simplify = FALSE)
microbenchmark::microbenchmark(
lapply = lapply(large_list, sum),
two_loops = two_loops(large_list),
builtin_sum = builtin_sum(large_list),
improved = improved_sum(large_list),
times = 100
) -> res
res
ggplot2::autoplot(res) +
ggplot2::theme_bw()
Unit: milliseconds
expr min lq mean median uq max neval cld
lapply 2.4638 2.7633 3.224807 3.04370 3.51925 5.6379 100 a
two_loops 265.7754 307.4380 327.912011 320.43895 336.63080 631.5728 100 b
builtin_sum 273.9828 309.8691 328.088739 324.40175 336.75415 608.7544 100 b
improved 1.5470 1.7755 2.390364 1.89355 2.12300 19.0634 100 a