rcombinationsexpand

Get all level combinations for each group


I have a list of customer IDs, each with a list of unique products they used. There can theoretically be up to ~150 unique products.

df <- tibble(ID = c(1,1,1,2,2,3,3,4),
             prod = c("Prod1", "Prod2", "Prod3", "Prod1", "Prod4", "Prod3", "Prod5", "Prod2"))

From that, I need to get all possible combinations of products for each ID, not only on the highest level (grouped by ID). That is, include the combination with all products, as expand_grid() would do, but also all combinations of 1,...,n elements, where n is the number of unique products the ID has.

Final dataset should therefore look as such:

df_results <- tibble(ID = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4),
                     combo = c("Prod1", "Prod2", "Prod3", "Prod1|Prod2", "Prod1|Prod3", "Prod2|Prod3", "Prod1|Prod2|Prod3",
                               "Prod1", "Prod4", "Prod1|Prod4",
                               "Prod3", "Prod5", "Prod3|Prod5",
                               "Prod2"))

Solution

  • An extension of the canonical answer:

    library(dplyr)
    df %>% 
      group_by(ID) %>% 
      reframe(combo = as.character(do.call(c, lapply(seq_along(prod), \(m) combn(x = prod, m = m, FUN = \(x) paste(x, collapse = "|"))))))
    
    # A tibble: 14 × 2
          ID combo            
       <dbl> <chr>            
     1     1 Prod1            
     2     1 Prod2            
     3     1 Prod3            
     4     1 Prod1|Prod2      
     5     1 Prod1|Prod3      
     6     1 Prod2|Prod3      
     7     1 Prod1|Prod2|Prod3
     8     2 Prod1            
     9     2 Prod4            
    10     2 Prod1|Prod4      
    11     3 Prod3            
    12     3 Prod5            
    13     3 Prod3|Prod5      
    14     4 Prod2           
    

    Or in base R:

    stack(tapply(df$prod, df$ID, 
           \(prod) do.call(c, lapply(seq_along(prod), \(m) combn(prod, m, FUN = \(x) paste(x, collapse = "|"))))))[2:1]