rlistvectorsumrcpp

Sum the vectors stored in the list using Rcpp


Suppose I have the following list of vector

List <- list(c(1:3), c(4:6), c(7:9))

To get the required result I have the following code in Rcpp

 totalCpp <- {"#include <Rcpp.h>
using namespace Rcpp;
  // [[Rcpp::export]]
List t_list(List r_list) {
  List results;
  for (int i = 0; i < r_list.size(); ++i) {
    NumericVector vec = as<NumericVector>(r_list[i]);
    int sum = 0;
    for (int j = 0; j < vec.size(); ++j) {
      sum += vec[j];
    }
    results.push_back(sum); // Add the sum to the results list
  }
  return results;
}
  "}
sourceCpp(code = totalCpp)

which returns the following

> t_list(List)
[[1]]
[1] 6

[[2]]
[1] 15

[[3]]
[1] 24

Is it possible to write this Rcpp code without using two for loops or is there any elegant way to write this code in the Rcpp?


Solution

  • {Rcpp} has a built in sum:

    library(inline)
    
    builtin_sum <- cxxfunction(
      signature(r_list = "list"), 
      body = '
       List input_list(r_list);
       List results;
       for (int i = 0; i < input_list.size(); ++i) {
         NumericVector vec = as<NumericVector>(input_list[i]);
         double vec_sum = sum(vec);
         results.push_back(vec_sum);
       }
       return results;
     ', 
      plugin = "Rcpp")
    

    This is besides the fact that lapply() works here:

    lapply(List, sum)
    

    Then if we want to be more elegant and actually gain some performance, we can pre-allocate the results vector and use direct assignment, instead of push_back.

    improved_sum <- cxxfunction(
      signature(r_list = "list"),
      body = '
        List input_list(r_list);
        int n = input_list.size();
        NumericVector results(n);  // Pre-allocate numeric vector
                                 
        for (int i = 0; i < n; ++i) {
          NumericVector vec = input_list[i];
          results[i] = sum(vec);  // Direct assignment, no push_back
        }
        return results;
        ', 
      plugin = "Rcpp")
    

    Here's a benchmark:

    set.seed(42)
    large_list <- replicate(10000, sample(1:100, 50), simplify = FALSE)
    
    microbenchmark::microbenchmark(
      lapply = lapply(large_list, sum),
      two_loops = two_loops(large_list),
      builtin_sum = builtin_sum(large_list),
      improved = improved_sum(large_list),
      times = 100
    ) -> res
    
    res
    ggplot2::autoplot(res) +
      ggplot2::theme_bw()
    
    Unit: milliseconds
           expr      min       lq       mean    median        uq      max neval cld
         lapply   2.4638   2.7633   3.224807   3.04370   3.51925   5.6379   100   a 
      two_loops 265.7754 307.4380 327.912011 320.43895 336.63080 631.5728   100   b
    builtin_sum 273.9828 309.8691 328.088739 324.40175 336.75415 608.7544   100   b
       improved   1.5470   1.7755   2.390364   1.89355   2.12300  19.0634   100   a