pythondictionarysetset-cover

Return keys when solving Set Cover problem with a dictionary and the greedy algorithm


I have a set cover problem to solve where I would like to have the names of the sets returned to me. I figure a good way to store the named sets is in a dictionary. I found this blog which implements the algorithm but uses a list of sets and I am trying to modify the code to accommodate a dictionary.

def set_cover2(universe, subsets):
    """Find a family of subsets that covers the universal set"""
    elements = set(e for s in subsets for e in subsets[s])
    # Check the subsets cover the universe
    if elements != universe:
        return None
    covered = set()
    cover = []
    # Greedily add the subsets with the most uncovered points
    while covered != elements:
        subset = max(subsets.values(), key=lambda s: len(s - covered))
        cover.append(subset)
        covered |= subset

    return cover

This returns the sets as a list:

universe = set(range(1, 11))
subsets = {"s1":set([1, 2, 3, 8, 9, 10]),
           "s2":set([1, 2, 3, 4, 5]),
           "s3":set([4, 5, 7]),
           "s4":set([5, 6, 7]),
           "s5":set([6, 7, 8, 9, 10])}


cover = set_cover2(universe, subsets)
print(cover)

i.e.

[{1, 2, 3, 8, 9, 10}, {4, 5, 7}, {5, 6, 7}]

but not the names.

To get the names I cannot use the values of the sets to identify the names as in my actual data some subsets may be the same (well i guess in that case either would do just as well). Regardless I would like a solution which returns the names of each set selected.

I imagine this must be possible by modifying the subset = max(subsets.values(), key=lambda s: len(s - covered)) line but I am not sure how to get the name of the set selected from this operation while retaining the set for the algorithm. How can I do this?

Desired output:

["s1", "s3", "s4"]

Solution

  • see the comments:

    def set_cover2(universe, subsets):
        """Find a family of subsets that covers the universal set"""
        # cosmetic change: same thing, just looks a bit nicer
        elements = set().union(*subsets.values())
    
        # ... some code here ...
      
        while covered != elements:
            # Use keys, account for this in the key function
            subset = max(subsets.keys(), key=lambda s: len(subsets[s] - covered))
            cover.append(subset)
            # since subset is a key now, change here as well
            covered |= subsets[subset]
    
        return cover