I have a set cover problem to solve where I would like to have the names of the sets returned to me. I figure a good way to store the named sets is in a dictionary. I found this blog which implements the algorithm but uses a list of sets and I am trying to modify the code to accommodate a dictionary.
def set_cover2(universe, subsets):
"""Find a family of subsets that covers the universal set"""
elements = set(e for s in subsets for e in subsets[s])
# Check the subsets cover the universe
if elements != universe:
return None
covered = set()
cover = []
# Greedily add the subsets with the most uncovered points
while covered != elements:
subset = max(subsets.values(), key=lambda s: len(s - covered))
cover.append(subset)
covered |= subset
return cover
This returns the sets as a list:
universe = set(range(1, 11))
subsets = {"s1":set([1, 2, 3, 8, 9, 10]),
"s2":set([1, 2, 3, 4, 5]),
"s3":set([4, 5, 7]),
"s4":set([5, 6, 7]),
"s5":set([6, 7, 8, 9, 10])}
cover = set_cover2(universe, subsets)
print(cover)
i.e.
[{1, 2, 3, 8, 9, 10}, {4, 5, 7}, {5, 6, 7}]
but not the names.
To get the names I cannot use the values of the sets to identify the names as in my actual data some subsets may be the same (well i guess in that case either would do just as well). Regardless I would like a solution which returns the names of each set selected.
I imagine this must be possible by modifying the subset = max(subsets.values(), key=lambda s: len(s - covered))
line but I am not sure how to get the name of the set selected from this operation while retaining the set for the algorithm. How can I do this?
Desired output:
["s1", "s3", "s4"]
see the comments:
def set_cover2(universe, subsets):
"""Find a family of subsets that covers the universal set"""
# cosmetic change: same thing, just looks a bit nicer
elements = set().union(*subsets.values())
# ... some code here ...
while covered != elements:
# Use keys, account for this in the key function
subset = max(subsets.keys(), key=lambda s: len(subsets[s] - covered))
cover.append(subset)
# since subset is a key now, change here as well
covered |= subsets[subset]
return cover