rperformancecombinatorics

All combinations 1 to n_1, 2 to n_2, ..., n to n_n in R as vectors in a list?


I am looking for the most efficient way to create combinations based on n different series in R.

Base R has this nice function called expand.grid which returns all the combinations as a data frame, but I need each and every combination as vector as a separate list elements.

So from there it should not be - and it is not - a difficult task to achieve what I want, but the fact that first I need to create a data frame seems to be an unneeded step in this process.

Let's have an example: Let's assume I want all the combinations, where the first element is 1,2,3 or 4, the second one is 1 or 2, and the third one is 1,2 or 3. The input should be the highest integer in the series:

library(dplyr)
c(4,2,3) %>% #This is the input; the length can be anything, here it happens to be 3
    lapply(\(elem) seq(elem)) %>% #Here we create the sequences: 1,2,3,4 & 1,2 & 1,2,3
    expand.grid %>% #A data frame with all the possible combinations
    {split(unlist(.),seq(nrow(.)))} #We unlist the whole data frame into one vector, and split it into 1,2,...,nrow(data frame) equally sized vectors, which ultimately become list elements

So, is there a more efficient way to achieve this?


Solution

  • (I just added this in order to remove the down vote, Aku-Ville Lehtimäki)

    If looking for efficiency, consider writing the code in C++:

    Rcpp::cppFunction("
    std::vector<std::vector<int>> product(std::vector<int> x){
      std::vector<std::vector<int>> result = {{}};
      for (auto i: x){
          std::vector<std::vector<int>> sub_result;
          for (auto j: result) for (auto k = 1; k<=i;k++) {
              std::vector<int> row(j);
              row.push_back(k);
              sub_result.push_back(row);
          }
          result = sub_result;
      }
      return result;
    }
    ")
    
    product(1:3)
    [[1]]
    [1] 1 1 1
    
    [[2]]
    [1] 1 1 2
    
    [[3]]
    [1] 1 1 3
    
    [[4]]
    [1] 1 2 1
    
    [[5]]
    [1] 1 2 2
    
    [[6]]
    [1] 1 2 3
    

    Edit

    I believe its possible to only use indexing instead of data copying which will significantly improve the speed. Here is a way to achieve half of the indexing:

    Rcpp::cppFunction("
    std::vector<std::vector<int>> product_2(std::vector<int> x){
      std::vector<std::vector<int>> result = {{}};
      for (auto i: x){
          std::vector<std::vector<int>> sub_result;
          for (auto j: result) {
            int n = j.size();
            j.resize(n+1);
            for (auto k = 1; k<=i;k++) {
              j[n] = k;
              sub_result.push_back(j);
            }
          }
          result = sub_result;
      }
      return result;
    }
    ")
    
    microbenchmark::microbenchmark(product(n), product_2(n), check = 'equal')
    Unit: milliseconds
             expr    min      lq     mean  median      uq     max neval
       product(n) 5.3796 5.53625 6.997803 6.17555 7.87635 14.4701   100
     product_2(n) 3.3674 3.55285 4.583415 4.14890 5.50670  8.4493   100
    

    Note that this is not the best, but its the best I could think of so far