pythoniterator

Understanding iterable types in comparisons


Recently I ran into cosmologicon's pywats and now try to understand part about fun with iterators:

>>> a = 2, 1, 3
>>> sorted(a) == sorted(a)
True
>>> reversed(a) == reversed(a)
False

Ok, sorted(a) returns a list and sorted(a) == sorted(a) becomes just a two lists comparision. But reversed(a) returns reversed object. So why these reversed objects are different?


Solution

  • The basic reason why id(reversed(a) == id(reversed(a) returns True , whereas reversed(a) == reversed(a) returns False , can be seen from the below example using custom classes -

    >>> class CA:
    ...     def __del__(self):
    ...             print('deleted', self)
    ...     def __init__(self):
    ...             print('inited', self)
    ...
    >>> CA() == CA()
    inited <__main__.CA object at 0x021B8050>
    inited <__main__.CA object at 0x021B8110>
    deleted <__main__.CA object at 0x021B8050>
    deleted <__main__.CA object at 0x021B8110>
    False
    >>> id(CA()) == id(CA())
    inited <__main__.CA object at 0x021B80F0>
    deleted <__main__.CA object at 0x021B80F0>
    inited <__main__.CA object at 0x021B80F0>
    deleted <__main__.CA object at 0x021B80F0>
    True
    

    As you can see when you did customobject == customobject , the object that was created on the fly was not destroyed until after the comparison occurred, this is because that object was required for the comparison.

    But in case of id(co) == id(co) , the custom object created was passed to id() function, and then only the result of id function is required for comparison , so the object that was created has no reference left, and hence the object was garbage collected, and then when the Python interpreter recreated a new object for the right side of == operation, it reused the space that was freed previously. Hence, the id for both came as same.

    This above behavior is an implementation detail of CPython (it may/may not differ in other implementations of Python) . And you should never rely on the equality of ids . For example in the below case it gives the wrong result -

    >>> a = [1,2,3]
    >>> b = [4,5,6]
    >>> id(reversed(a)) == id(reversed(b))
    True
    

    The reason for this is again as explained above (garbage collection of the reversed object created for reversed(a) before creation of reversed object for reversed(b)).


    If the lists are large, I think the most memory efficient and most probably the fastest method to compare equality for two iterators would be to use all() built-in function along with zip() function for Python 3.x (or itertools.izip() for Python 2.x).

    Example for Python 3.x -

    all(x==y for x,y in zip(aiterator,biterator))
    

    Example for Python 2.x -

    from itertools import izip
    all(x==y for x,y in izip(aiterator,biterator))
    

    This is because all() short circuits at the first False value is encounters, and `zip() in Python 3.x returns an iterator which yields out the corresponding elements from both the different iterators. This does not need to create a separate list in memory.

    Demo -

    >>> a = [1,2,3]
    >>> b = [4,5,6]
    >>> all(x==y for x,y in zip(reversed(a),reversed(b)))
    False
    >>> all(x==y for x,y in zip(reversed(a),reversed(a)))
    True