rdataframecombinations

Generate Combinations of Comma-Separated Characters for Two Columns


I have an (example) data frame, including an identifier (here "ID") and two variables of interest (here "V1" and "V2"):

df <- data.frame(ID = c("Sample 1", "Sample 2", "Sample 3"), 
                 V1 = c("A, B, C",  "E"       , "A, F"),
                 V2 = c("H, G"   ,  "C, A"    , "J"))

For the two variables (i.e., columns) of interest, I would like to generate all potential combinations of comma-separated characters, identifiable via the identifier, (by row-wise declination).

The result data frame might look like this (whereby some flexibility in the order and structure is acceptable as long as the combinations are complete):

rd <- data.frame(ID = c("Sample 1","Sample 1", "Sample 1", "Sample 1", "Sample 1", "Sample 1", "Sample 2", "Sample 2", "Sample 3", "Sample 3"),
                 V1 = c("A", "A", "B", "B", "C", "C", "E", "E", "A", "F"),
                 V2 = c("H","G","H","G","H","G","C","A","J","J"))

Thank you very much for your support.


Solution

  • library(tidyr)
    df  |>
      separate_rows(V1) |>
      separate_rows(V2)
    

    Result

             ID V1 V2
    1  Sample 1  A  H
    2  Sample 1  A  G
    3  Sample 1  B  H
    4  Sample 1  B  G
    5  Sample 1  C  H
    6  Sample 1  C  G
    7  Sample 2  E  C
    8  Sample 2  E  A
    9  Sample 3  A  J
    10 Sample 3  F  J