arraysrcombinatoricspermute

Permutations of 3 elements within 6 positions


I'm looking to permute (or combine) c("a","b","c") within six positions under the condition to have always sequences with alternate elements, e.g abcbab.

Permutations could easily get with:

abc<-c("a","b","c")
permutations(n=3,r=6,v=abc,repeats.allowed=T)

I think is not possible to do that with gtools, and I've been trying to design a function for that -even though I think it may already exist.


Solution

  • Since you're looking for permutations, expand.grid can work as well as permutations. But since you don't want like-neighbors, we can shorten the dimensionality of it considerably. I think this is legitimate random-wise!

    Up front:

    r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
    r[[1]] <- c(r[[1]], length(abc))
    m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
    m[] <- abc[m]
    dim(m)
    # [1] 96  6
    head(as.data.frame(cbind(m, apply(m, 1, paste, collapse = ""))))
    #   Var1 Var2 Var3 Var4 Var5 Var6     V7
    # 1    b    c    a    b    c    a bcabca
    # 2    c    a    b    c    a    b cabcab
    # 3    a    b    c    a    b    c abcabc
    # 4    b    a    b    c    a    b babcab
    # 5    c    b    c    a    b    c cbcabc
    # 6    a    c    a    b    c    a acabca
    

    Walk-through:


    Performance:

    library(microbenchmark)
    library(dplyr)
    library(tidyr)
    library(stringr)
    
    microbenchmark(
      tidy1 = {
        gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE) %>% 
          data.frame() %>% 
          unite(united, sep = "", remove = FALSE) %>%
          filter(!str_detect(united, "([a-c])\\1"))
      },
      tidy2 = {
          filter(unite(data.frame(gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE)),
                       united, sep = "", remove = FALSE),
                 !str_detect(united, "([a-c])\\1"))
      },
      base = {
        r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
        r[[1]] <- c(r[[1]], length(abc))
        m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
        m[] <- abc[m]
      },
      times=10000
    )
    # Unit: microseconds
    #   expr      min        lq     mean   median       uq       max neval
    #  tidy1 1875.400 2028.8510 2446.751 2165.651 2456.051 12790.901 10000
    #  tidy2 1745.402 1875.5015 2284.700 2000.051 2278.101 50163.901 10000
    #   base  796.701  871.4015 1020.993  919.801 1021.801  7373.901 10000
    

    I tried the infix (non-%>%) tidy2 version just for kicks, and though I was confident it would theoretically be faster, I didn't realize it would shave over 7% off the run-times. (The 50163 is likely R garbage-collecting, not "real".) The price we pay for readability/maintainability.