rpermutation

R: take random sample of all possible permutations


My question is related to this one: R: first N of all permutations. In that link, there is a function that takes the first N possible permutations of a set of integers. I want something similar but rather than taking the first N, I want to select a random sample all possible permutations. I would like to do this without having to first list out all possible permutations, as that would be computationally intensive for large sets of integers. How can I do this in R?


Solution

  • These are the fastest methods I've seen:

    v <- 1:10 # vector to permute
    N <- 1e5 # number of samples
    
    x <- Rfast::colShuffle(matrix(v, length(v), N)) # sampling with replacement
    x <- RcppAlgos::permuteSample(v, n = N)         # sampling without replacement
    

    Note that the colShuffle approach will give the samples column-wise, while permuteSample will give the samples row-wise. Also beware that colShuffle doesn't respect the global PRNG.


    A bonus consideration: if the vector to permute is relatively large (e.g., >20), the probability of repeating a sample when sampling with replacement becomes vanishingly small. In those cases, colShuffle may be preferable for speed:

    library(Rfast)
    library(RcppAlgos)
    
    v <- 1:25
    N <- 1e6
    
    microbenchmark::microbenchmark(
      replacement = colShuffle(matrix(v, length(v), N)),
      woreplacment = permuteSample(v, n = N),
      times = 10
    )
    #> Unit: milliseconds
    #>          expr       min        lq      mean  median        uq       max neval
    #>   replacement  654.9532  665.7043  682.3385  675.38  679.2244  768.7495    10
    #>  woreplacment 2828.9032 2864.9226 2897.5245 2895.86 2925.7042 2975.3763    10
    

    The probability of seeing a repeat in 1M samples of a permutation of 25:

    library(Rmpfr)
    
    n <- factorial(mpfr(length(v), 1024))
    N <- mpfr(N, 1024)
    as.numeric(-expm1(lgamma(n + 1) - N*log(n) - lgamma(n - N + 1)))
    #> [1] 3.223472e-14