[SOLVED] How to process each item in an ordered list with duplicates only once in Python?

How to process each item in an ordered list with duplicates only once in Python?

I have an ordered list of things to process that includes some duplicates and I only want to process the first occurrence. Presently, I'm doing it like this in Python v2.7:

seen = set()
for (value, fmt) in formats:
  if fmt not in seen:
    seen.add(fmt)
    process(value, fmt)

Is there anyway to simultaneously insert a new element into seen and detect whether or not it was already present? (This would avoid the repeated lookup of fmt in the set.)

seen = set()
for (value, fmt) in formats:
  # myInsert() would return true if item was not already present.
  if seen.myInsert(fmt):
    process(value, fmt)

Or, alternatively, can I somehow just filter my formats to exclude duplicate entries before looping?

unique_formats = removeDuplicates(formats, key=itemgetter(1))
for (value, fmt) in unique_formats:
  process(value, fmt)

Solution

You could take the length of the set before and after the add(). If it didn't change, the format was already in the set.

seen = set()
for (value, fmt) in formats:
    l1 = len(seen)
    seen.add(fmt)
    if l1 != len(seen):
         process(value, fmt)

Your question presumes that the in test is a costly operation. This turns out not to be the case. Using len() can take more time, although both are quite fast;

In [4]: seen = set(range(10000))

In [5]: %timeit 5995 in seen
10000000 loops, best of 3: 122 ns per loop

In [6]: %timeit len(seen)
1000000 loops, best of 3: 167 ns per loop

(measured with CPython 2.7.3 on a 2.5 GHz Core Quad Q9300)