My question is related to this one: R: first N of all permutations. In that link, there is a function that takes the first N possible permutations of a set of integers. I want something similar but rather than taking the first N, I want to select a random sample all possible permutations. I would like to do this without having to first list out all possible permutations, as that would be computationally intensive for large sets of integers. How can I do this in R?
These are the fastest methods I've seen:
v <- 1:10 # vector to permute
N <- 1e5 # number of samples
x <- Rfast::colShuffle(matrix(v, length(v), N)) # sampling with replacement
x <- RcppAlgos::permuteSample(v, n = N) # sampling without replacement
Note that the colShuffle
approach will give the samples column-wise, while permuteSample
will give the samples row-wise. Also beware that colShuffle
doesn't respect the global PRNG.
A bonus consideration: if the vector to permute is relatively large (e.g., >20), the probability of repeating a sample when sampling with replacement becomes vanishingly small. In those cases, colShuffle
may be preferable for speed:
library(Rfast)
library(RcppAlgos)
v <- 1:25
N <- 1e6
microbenchmark::microbenchmark(
replacement = colShuffle(matrix(v, length(v), N)),
woreplacment = permuteSample(v, n = N),
times = 10
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> replacement 654.9532 665.7043 682.3385 675.38 679.2244 768.7495 10
#> woreplacment 2828.9032 2864.9226 2897.5245 2895.86 2925.7042 2975.3763 10
The probability of seeing a repeat in 1M samples of a permutation of 25:
library(Rmpfr)
n <- factorial(mpfr(length(v), 1024))
N <- mpfr(N, 1024)
as.numeric(-expm1(lgamma(n + 1) - N*log(n) - lgamma(n - N + 1)))
#> [1] 3.223472e-14