[SOLVED] R: take random sample of all possible permutations

R: take random sample of all possible permutations

My question is related to this one: R: first N of all permutations. In that link, there is a function that takes the first N possible permutations of a set of integers. I want something similar but rather than taking the first N, I want to select a random sample all possible permutations. I would like to do this without having to first list out all possible permutations, as that would be computationally intensive for large sets of integers. How can I do this in R?

Solution

These are the fastest methods I've seen:

v <- 1:10 # vector to permute
N <- 1e5 # number of samples

x <- Rfast::colShuffle(matrix(v, length(v), N)) # sampling with replacement
x <- RcppAlgos::permuteSample(v, n = N)         # sampling without replacement

Note that the colShuffle approach will give the samples column-wise, while permuteSample will give the samples row-wise. Also beware that colShuffle doesn't respect the global PRNG.

A bonus consideration: if the vector to permute is relatively large (e.g., >20), the probability of repeating a sample when sampling with replacement becomes vanishingly small. In those cases, colShuffle may be preferable for speed:

library(Rfast)
library(RcppAlgos)

v <- 1:25
N <- 1e6

microbenchmark::microbenchmark(
  replacement = colShuffle(matrix(v, length(v), N)),
  woreplacment = permuteSample(v, n = N),
  times = 10
)
#> Unit: milliseconds
#>          expr       min        lq      mean  median        uq       max neval
#>   replacement  654.9532  665.7043  682.3385  675.38  679.2244  768.7495    10
#>  woreplacment 2828.9032 2864.9226 2897.5245 2895.86 2925.7042 2975.3763    10

The probability of seeing a repeat in 1M samples of a permutation of 25:

library(Rmpfr)

n <- factorial(mpfr(length(v), 1024))
N <- mpfr(N, 1024)
as.numeric(-expm1(lgamma(n + 1) - N*log(n) - lgamma(n - N + 1)))
#> [1] 3.223472e-14