I am looking for the most efficient way to create combinations based on n different series in R.
Base R has this nice function called expand.grid which returns all the combinations as a data frame, but I need each and every combination as vector as a separate list elements.
So from there it should not be - and it is not - a difficult task to achieve what I want, but the fact that first I need to create a data frame seems to be an unneeded step in this process.
Let's have an example: Let's assume I want all the combinations, where the first element is 1,2,3 or 4, the second one is 1 or 2, and the third one is 1,2 or 3. The input should be the highest integer in the series:
library(dplyr)
c(4,2,3) %>% #This is the input; the length can be anything, here it happens to be 3
lapply(\(elem) seq(elem)) %>% #Here we create the sequences: 1,2,3,4 & 1,2 & 1,2,3
expand.grid %>% #A data frame with all the possible combinations
{split(unlist(.),seq(nrow(.)))} #We unlist the whole data frame into one vector, and split it into 1,2,...,nrow(data frame) equally sized vectors, which ultimately become list elements
So, is there a more efficient way to achieve this?
(I just added this in order to remove the down vote, Aku-Ville Lehtimäki)
If looking for efficiency, consider writing the code in C++:
Rcpp::cppFunction("
std::vector<std::vector<int>> product(std::vector<int> x){
std::vector<std::vector<int>> result = {{}};
for (auto i: x){
std::vector<std::vector<int>> sub_result;
for (auto j: result) for (auto k = 1; k<=i;k++) {
std::vector<int> row(j);
row.push_back(k);
sub_result.push_back(row);
}
result = sub_result;
}
return result;
}
")
product(1:3)
[[1]]
[1] 1 1 1
[[2]]
[1] 1 1 2
[[3]]
[1] 1 1 3
[[4]]
[1] 1 2 1
[[5]]
[1] 1 2 2
[[6]]
[1] 1 2 3
I believe its possible to only use indexing instead of data copying which will significantly improve the speed. Here is a way to achieve half of the indexing:
Rcpp::cppFunction("
std::vector<std::vector<int>> product_2(std::vector<int> x){
std::vector<std::vector<int>> result = {{}};
for (auto i: x){
std::vector<std::vector<int>> sub_result;
for (auto j: result) {
int n = j.size();
j.resize(n+1);
for (auto k = 1; k<=i;k++) {
j[n] = k;
sub_result.push_back(j);
}
}
result = sub_result;
}
return result;
}
")
microbenchmark::microbenchmark(product(n), product_2(n), check = 'equal')
Unit: milliseconds
expr min lq mean median uq max neval
product(n) 5.3796 5.53625 6.997803 6.17555 7.87635 14.4701 100
product_2(n) 3.3674 3.55285 4.583415 4.14890 5.50670 8.4493 100
Note that this is not the best, but its the best I could think of so far