[SOLVED] Efficiently remove duplicates, order-agnostic, from list of lists

Efficiently remove duplicates, order-agnostic, from list of lists

The following list has some duplicated sublists, with elements in different order:

l1 = [
    ['The', 'quick', 'brown', 'fox'],
    ['hi', 'there'],
    ['jumps', 'over', 'the', 'lazy', 'dog'],
    ['there', 'hi'],
    ['jumps', 'dog', 'over','lazy', 'the'],
]

How can I remove duplicates, retaining the first instance seen, to get:

l1 = [
    ['The', 'quick', 'brown', 'fox'],
    ['hi', 'there'],
    ['jumps', 'over', 'the', 'lazy', 'dog'],
]

I tried to:

[list(i) for i in set(map(tuple, l1))]

Nevertheless, I do not know if this is the fastest way of doing it for large lists, and my attempt is not working as desired. Any idea of how to remove them efficiently?

Solution

This one is a little tricky. You want to key a dict off of frozen counters, but counters are not hashable in Python. For a small degradation in the asymptotic complexity, you could use sorted tuples as a substitute for frozen counters:

seen = set()
result = []
for x in l1:
    key = tuple(sorted(x))
    if key not in seen:
        result.append(x)
        seen.add(key)

The same idea in a one-liner would look like this:

[*{tuple(sorted(k)): k for k in reversed(l1)}.values()][::-1]