riterationcombn

How to iterate column values to find out all possible combinations in R?


Suppose you have a dataframe with ids and elements prescripted to each id. For example:

example <- data.frame(id = c(1,1,1,1,1,2,2,2,3,4,4,4,4,4,4,4,5,5,5,5),
                      vals = c("a","b",'c','d','e','a','b','d','c',
                                 'd','f','g','h','a','k','l','m', 'a',
                                 'b', 'c'))

I want to find all possible pair combinations. The main struggle here is not the functional of R language that I can use, but the logic. How can I iterate through all elements and find patterns? For instance, a was picked with b 3 times in my sample dataframe. But original dataframe is more than 30k rows, so I cannot count these combinations manually. How do I automatize this process of finding the number of picks of each elements?

I was thinking about widening my df with pivot_wider and then using map_lgl to find matches. Then I faced the problem that it will take a lot of time for me to find all possible combinations, applying map_lgl for every pair of elements.

I was asking nearly the same question less than a month ago, fellow users answered it but the result is not anything I really need.

Do you have any ideas how to create a dataframe with all possible combinations of values for all ids?


Solution

  • This won't (can't) be fast for many IDs. If it is too slow, you need to parallelize or implement it in a compiled language (e.g., using Rcpp).

    We sort vals. We can then create all combination of two items grouped by ID. We exclude ID's with 1 item. Finally we tabulate the result.

    library(data.table)
    setDT(example)
    setorder(example, id, vals)
    example[, if (.N > 1) split(combn(vals, 2), 1:2), by = id][, .N, by = c("1", "2")]
    #    1 2 N
    # 1: a b 3
    # 2: a c 2
    # 3: a d 3
    # 4: a e 1
    # 5: b c 2
    # 6: b d 2
    # 7: b e 1
    #<...>