I am working my way through the Rosalind problems and have become stuck on the problem Ordering Strings of Varying Length Lexicographically.
So far I have succesfully managed to create all the right permutations of letters of varying lengths. The main issue is now how to sort them based on the order the letters were supplied.
The example input are the letters D N A
. But can be at most 12 unique letters in n <= 4 permutations.
For the example n = 3.
This yields 39 different permutations with replications, but these are then to be sorted lexicographically in the order of D
before N
before A
.
Meaning that the correct order is:
Correct | Incorrect |
---|---|
D | A |
DD | AA |
DDD | AAA |
DDN | AAD |
DDA | AAN |
DN | AD |
DND | ADA |
DNN | ADD |
DNA | ADN |
DA | AN |
DAD | ANA |
DAN | AND |
DAA | ANN |
... | ... |
AAD | NNA |
AAN | NND |
AAA | NNN |
My first thought was to convert them to factors with levels and then attempt to sort them based on their levels but I cannot quite make it work.
so far I create the list of all permutations and then attempt to sort it, but don't know how to get the sorting to follow the given order of D N A
text_input <- c("D", "N", "A")
n <- 3
empty_df <- data.frame(matrix("", ncol = n))
temp_df <- data.frame()
for (i in n:1) {
temp_df <- data.frame(arrangements::permutations(text_input, k = i, replace = TRUE))
empty_df <- bind_rows(empty_df, temp_df)
}
result_df <- replace(empty_df, is.na(empty_df), "") |>
unite(col = combined, everything(), sep = "", remove = FALSE) |>
mutate(across(2:(n+2), ~ factor(.x, levels = text_input)),
across(2:(n+2), ~ str_replace_na(.x, replacement = "")))
result_vec <- tail(result_df$combined, -1)
I'll use the sample data you have, Correct
, randomized to ensure we get the order right.
quux <- structure(list(Correct = c("D", "DD", "DDD", "DDN", "DDA", "DN", "DND", "DNN", "DNA", "DA", "DAD", "DAN", "DAA", "AAD", "AAN", "AAA"), Incorrect = c("A", "AA", "AAA", "AAD", "AAN", "AD", "ADA", "ADD", "ADN", "AN", "ANA", "AND", "ANN", "NNA", "NND", "NNN")), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L), class = "data.frame")
set.seed(42)
quuxR <- quux[sample(nrow(quux)),]
quuxR$Correct
# [1] "D" "DDA" "AAA" "DNA" "DA" "DDN" "DD" "AAD" "DNN" "DND" "DAD" "DAA" "AAN" "DAN" "DDD" "DN"
ltrs <- c("D", "N", "A")
inds <- lapply(strsplit(quuxR$Correct, ""), match, table = ltrs)
inds <- lapply(inds, `length<-`, max(lengths(inds)))
quuxR$Correct[do.call(order, c(do.call(Map, c(f=c, inds)), list(na.last = FALSE)))]
# [1] "D" "DD" "DDD" "DDN" "DDA" "DN" "DND" "DNN" "DNA" "DA" "DAD" "DAN" "DAA" "AAD" "AAN" "AAA"
identical(quux$Correct, quuxR$Correct[do.call(order, c(do.call(Map, c(f=c, inds)), list(na.last = FALSE)))])
# [1] TRUE
Steps:
strsplit(.., "")
splits a string into individual letters.match(.., table=ltrs)
replaces a letter with the index in ltrs
, which gives us the proper priority/sort of the letters.`length<-`
is because we are going to subsequently rbind
them, but that will only work if they all have the same length. Doing this step (along with max(lengths(..))
fills the shorter vectors with NA
s so that all sub-vectors have the same number of elements.do.call(Map, c(f=c, inds))
transposes a list with m
vectors length n
into a n
-length list, each with m
elements. This is very helpful to have for the next step ...do.call(order, ..)
is similar to calling order(L[[1]], L[[2]], L[[3]])
(if L
is the transposed list), but more programmatic. We add list(na.last=FALSE)
as an argument to the list of vectors so that we can get the right ordering of shorter strings.