python-3.xsetcombinationsintersection

How to find the largest intersection among multiple subsets (most common element in n out of k sets)?


Just an example:

set_0={0,3,4}
set_1={1,3,4}
set_2={1,5,23,8,24}
set_4={1,2,6,10}
set_5={1,60,34,2}
set_6={1,45,32,4}
set_7={1,6,9,14}
set_8={1,56,3,23}
set_9={1,34,23,3}
all_intersection=set.intersection(set_0,set_1,set_2,set_3,set_4, set_5, set_6, set_7, set_8, set_9)

gives empty set. Is there any way I can find the intersection among all possible combinations of 9 out of 10 sets in a pythonic way (perhaps without the brute force approach).

For this dataset I would expect to retrieve 1.


Solution

  • How to find the largest intersection among multiple subsets (most common element in n out of k sets)?

    Trying to call intersection on the class set is going to lead to errors because the class set isn't a set() object. Although, even choosing a proper set to form a basis of intersection won't determine the frequency of elements. That will only tell you which elements are in all sets (if any). The Counter class from the collections module can provide such a frequency.

    But, there isn't a clear "most common element" in n out of k sets, since all elements that are in exactly n out of k sets are equally as common. You can, however, clearly determine how common an element is among k chosen sets.

    from collections import Counter
    
    set_0 = {0, 3, 4}
    set_1 = {1, 3, 4}
    set_2 = {1, 5, 23, 8, 24}
    set_4 = {1, 2, 6, 10}  # you're missing set_3
    set_5 = {1, 60, 34, 2}
    set_6 = {1, 45, 32, 4}
    set_7 = {1, 6, 9, 14}
    set_8 = {1, 56, 3, 23}
    set_9 = {1, 34, 23, 3}
    my_sets = (set_0, set_1, set_2, set_4, set_5, set_6, set_7, set_8, set_9)
    
    # This leads to the empty set you mentioned:
    all_elements = set().union(*my_sets)
    elements_shared_among_all_sets = all_elements.intersection(*my_sets)
    print(f"elements in all sets: {elements_shared_among_all_sets}")
    # elements in all sets: set()
    
    # This counter will show (element, n)-pairs:
    counter = Counter(element for sub_set in my_sets for element in sub_set)
    print(f"most common 5 elements: {counter.most_common(5)}")
    # most common 5 elements: [(1, 8), (3, 4), (4, 3), (23, 3), (2, 2)]
    
    # Explicit n out of k example:
    element = 1
    n = counter[element]
    k = len(my_sets)
    print(f"{element=} is in {n=} out of {k=} sets.")
    # element=1 is in n=8 out of k=9 sets