pythonalgorithmoptimizationset-cover

Optimizing Itertools Results in Python


I am calling itertools in python (see below). In this code, snp_dic is a dictionary with integer keys and sets as values. The goal here is to find the minimum list of keys whose union of values is a combination of unions of sets that is equivalent to the set_union. (This is equivalent to solving for a global optimum for the popular NP-hard graph theory problem set-cover for those of you interested)! The algorithm below works but the goal here is optimization.

The most obvious optimization I see has to do with itertools. Let's say for a length r, there exists a combination of r sets in snp_dic whose union = set_union. Basic probability dictates that if this combination exists and is distributed somewhere uniformly at random over the combinations, it is expected to on average only have to iterate over have the combinations to find this set-covering combination. Itertools however will return all the possible combinations, taking twice as long as the expected time of checking set_unions by checking at each iteration.

A logical solution would seem to be simply by to implement itertools.combinations() locally. Based on the "equivalent" python implementation of itertools.combinations() in the python docs however the time is approximately twice as slow because itertools.combinations calls a C level implementation rather than a python-native one.

The question (finally) is then, how can I stream the results of itertools.combinations() one by one so I can check set unions as I go along so it still runs at a near equivalent time as the python implementation of itertools.combinations(). In an answer I would appreciate if you could include the results of timing your new method to prove it runs at a similar time as the python-native implementation. Any other optimizations also appreciated.

def min_informative_helper(snp_dic, min, set_union):
    union = lambda set_iterable : reduce(lambda a,b: a|b, set_iterable) #takes the union of sets
    for i in range(min, len(snp_dic)):
       combinations = itertools.combinations(snp_dic, i)
       combinations = [{i:snp_dic[i] for i in combination} for combination in combinations]
       for combination in combinations:
           comb_union = union(combination.values())
           if(comb_union == set_union):
           return combination.keys()

Solution

  • itertools provides generators for the things it returns. To stream them simply use

    for combo in itertools.combinations(snp_dic, i):
        ... remainder of your logic
    

    The combinations method returns one new element each time you access it: one per loop iteration.