pythonlistduplicatesgraph-theory

Removing list duplicates given indices symmetry in python


In python, given a list mylist of lists el of integers, I would like to remove duplicates that are equivalent under specific permutations of the indices.

The question is more general but I have in mind "decorated" McKay graphs where each node is given an integer defining el whose sum is equal to a certain number. Generating mylist is easy enough, but there is a lot of redundancy depending on the symmetry of the graph. For instance, for the "D4" diagram, el represents the following decorated graph:

                                       a
                                       |
    el = [a, b, c, d, e]     <-->    b-c―d
                                       |
                                       e

The two lists [1,0,4,0,0] and [0,0,4,0,1] are duplicates because there are the same up to a reflection of the graph around the horizontal axis. More generally, the lists el duplicates if there are equivalent under any permutation of a,b,d,e. Is there a simple way of removing duplicates given an arbitrary symmetry?

The way I did it for this particular graph was to sort the indices 0,1,3,4 and compare:

from itertools import product

def check_duplicate(f, li):

    for l in li:
        if l[2] == f[2] and sorted(l[:2] + l[3:]) == sorted(f[:2] + f[3:]):
            return True

    return False

mylist = [list(i) for i in product(range(5), repeat=5) if sum(i) == 5]

newlist = []
for f in my_list:
    if check_duplicate(f, newlist) == False:
        newlist.append(f)

Here tmp is generated brute force, but the condition is more involved in my real case.

This works well enough for this particular example, but is a bit clunky, and the implementation is harder to generalize to more involved cases. Is there a way to remove the duplicates in a more optimized way, in particular one that can easily implement removing duplicates given a particular symmetry of the indices of el?


Solution

  • It will probably be better to add "seen" cases to a set, and then continue to check if new elements have already been "seen", rather than comparing every element to every other element in the list iteratively (an operation with O(n^2) time complexity). Hopefully I understand what you're trying to accomplish, but here's how I'd do it:

    from itertools import product
    
    def to_key(f):
        # Creates a hashable key, keeping order-insensitive values sorted, and noting the middle value"""
        return (f[2], tuple(sorted(f[:2] + f[3:])))
    
    mylist = [list(i) for i in product(range(5), repeat=5) if sum(i) == 5]
    
    # Init an empty set. We'll add to it if a key isn't there
    seen = set()
    # If we add to the seen set, we'll add the element to newList
    newlist = []
    
    for f in mylist:
        key = to_key(f) # Create the hashable key
        if key not in seen: # Check if key in seen
            seen.add(key)
            newlist.append(f)