I have a list of number-letter pairs like:
all_edges_array = [
[1,'a'],[1,'b'],[1,'c'],
[2,'c'],[2,'d'],
[3,'b'],[3,'c']
]
Notice that the input pairs are not a cross-product of the letters and numbers used - for example, [2, 'a']
is missing.
I want to efficiently find the combinations of some number of pairs, such that within a group, no two pairs use the same number or the same letter.
For the above input, there should be 5 total results: [([1, 'a'], [2, 'c'], [3, 'b']), ([1, 'a'], [2, 'd'], [3, 'b']), ([1, 'a'], [2, 'd'], [3, 'c']), ([1, 'b'], [2, 'd'], [3, 'c']), ([1, 'c'], [2, 'd'], [3, 'b'])]
.
Other combinations are not valid: for example, ([1, 'a'], [1, 'b'], [3, 'c'])
contains two pairs using the same number (1), and ([1, 'b'], [2, 'c'], [3, 'b'])
contains two pairs using the same letter (b).
I have code which brute-forces this by using itertools.combinations
and then filtering the result:
from itertools import combinations
number_letter_dict = {1:['a','b','c'], 2:['c','d'], 3:['b','c']}
# create the edges based on the connections in the number_letter_dict
all_edges_array = []
for key in number_letter_dict.keys():
for item in number_letter_dict[key]:
all_edges_array.append([key, item])
# get the number of connections relative to the number of keys in dict
number_of_connections = len(number_letter_dict.keys())
# Find all 35 combinations
all_combinations_array = list(combinations(all_edges_array, number_of_connections))
# cut down the list of combinations to what I actually need
all_good_combinations = []
for collection in all_combinations_array:
duplicated_item = False
seen_indices = []
for item in collection:
if item[0] in seen_indices:
duplicated_item = True
break
if item[1] in seen_indices:
duplicated_item = True
break
seen_indices.append(item[0])
seen_indices.append(item[1])
# all clear--add the collection! :)
if not duplicated_item:
all_good_combinations.append(collection)
This works, but it's inefficient - it takes an unacceptably long time to run for my actual input. Many more combinations are generated than are valid, which only gets worse the more edges and connections there are.
How can I improve on this algorithm? I assume that it involves not generating invalid combinations in the first place, but I don't see a way to accomplish that.
I found the previous Q&A Python: How to generate all combinations of lists of tuples without repeating contents of the tuple, but it doesn't answer my question. The answers there assume that the input contains all possible pairs (and also that the number of pairs in the combinations should be equal to the number of possibilities for the more constrained pair element).
EDIT: I have replaced some of the minimized code because it caused more confusion than it saved: oops? Also, this code does work. When given enough time it will reliably give the correct answer. That said, spending five days processing one image is not quite fast enough for my tastes.
It is always inefficient to generate all combinations and throw away unwanted combinations when you can selectively generate wanted combinations to begin with.
The desired combinations can be most efficiently generated by recursively yielding combinations of unused letters paired with the number of the current recursion level, where a set can be used to keep track used letters:
def unique_combinations(number_letter_dict):
def _unique_combinations(index):
if index == size:
yield ()
return
number, letters = number_letters[index]
for letter in letters:
if letter not in used_letters:
used_letters.add(letter)
for combination in _unique_combinations(index + 1):
yield [number, letter], *combination
used_letters.remove(letter)
number_letters = list(number_letter_dict.items())
size = len(number_letters)
used_letters = set()
return list(_unique_combinations(0))
so that:
number_letter_dict = {1: ['a', 'b', 'c'], 2: ['c', 'd'], 3: ['b', 'c']}
print(unique_combinations(number_letter_dict))
outputs:
[([1, 'a'], [2, 'c'], [3, 'b']), ([1, 'a'], [2, 'd'], [3, 'b']), ([1, 'a'], [2, 'd'], [3, 'c']), ([1, 'b'], [2, 'd'], [3, 'c']), ([1, 'c'], [2, 'd'], [3, 'b'])]