Existing Approaches to Structural Subtyping
Abstract classes defined in
collections.abc
module are slightly more advanced since they implement a custom__subclasshook__()
method that allows runtime structural checks without explicit registration:
from collections.abc import Iterable
class MyIterable:
def __iter__(self):
return []
assert isinstance(MyIterable(), Iterable)
But Python glossary: Iterable:
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an
__iter__()
method or with a__getitem__()
method that implements Sequence semantics.
"or with a __getitem__()
"
So I expect that this code run without any AssertionError
:
from collections.abc import Iterable
class MyIterable:
def __getitem__(self, item):
return []
assert isinstance(MyIterable(), Iterable)
But it doesn't:
Traceback (most recent call last):
File "file.py", line 7, in <module>
assert isinstance(MyIterable(), Iterable)
AssertionError
So why, even if an Iterable should implement __iter__
OR __getitem__
, __getitem__
doesn't works if we want to check if it's an Iterable.
I also tested with Mypy
:
from collections.abc import Iterable
class MyIterable1:
def __iter__(self):
return []
class MyIterable2:
def __getitem__(self):
return []
def foo(bar: Iterable):
...
foo(MyIterable1())
foo(MyIterable2())
Type check result:
$ mypy .\scratch_443.py
test_file.py:15: error: Argument 1 to "foo" has incompatible type "MyIterable2"; expected "Iterable[Any]"
Found 1 error in 1 file (checked 1 source file)
While you did cite most of the relevant passages, I would like to add a little bit of additional context and another perspective.
The problem lies (as it often does) in the definitions, of which there are two in this case.
Iterable
The collections.abc.Iterable
is not flawed, it just leans on a more narrow definition of the term. In that definition, if a class implements the __iter__
method, it is considered iterable; plain and simple. Mind you, this does not (and can not) impose any constraints on what happens inside that method or what it returns.
One of the consequences of this is that technically the method could return something silly, like an integer for example, even though we would reasonably expect the __iter__
method to always return an iterator (i.e. something implementing the __next__
method).
Case in point:
from collections.abc import Iterable
class Foo:
def __iter__(self) -> int:
return 1
assert isinstance(Foo(), Iterable) # passes
iter(Foo()) # TypeError: iter() returned non-iterator of type 'int'
The error is only raised inside the iter
function, as it presumably checks the existence of __next__
in the __dict__
of the class (!) of the provided object.
class NotReallyAnIterator:
__next__ = None
class Foo:
def __iter__(self) -> NotReallyAnIterator:
return NotReallyAnIterator()
it = iter(Foo()) # passes
next(it) # TypeError: 'NoneType' object is not callable
This last point is tangential, but still relevant to the discussion IMO.
The term "iterable" is defined more broadly in the glossary as an object whose class corresponds to the aforementioned Iterable
protocol or, as you quoted,
with a
__getitem__()
method that implements Sequence semantics.
And you'll notice I highlighted that last portion of the sentence. This part is actually important to understanding the problem at hand. This is unfortunately not expanded on further in the glossary, but if we take a look at the documentation for the built-in iter()
, which is (as the docs tell us) the only reliable way of checking, if an object is iterable, we find the following clarification. It says the argument
must be a collection object which supports the iterable protocol (the
__iter__()
method), or it must support the sequence protocol (the__getitem__()
method with integer arguments starting at0
).
This qualification is important because simply having the __getitem__
method does not constitute a Sequence
. It is a necessary but not sufficient requirement, as e.g. the Mapping
protocol also requires the __getitem__
method to be implemented, but neither of those two is a subclass of the other (as you can see here).
__getitem__
merely allows subscripting an instance with a key
(i.e. using square brackets [key]
with them) and the sequence protocol requires an accepted key
to be an integer (or slice).
Why is this relevant?
Because while we can know if an object's class implements __getitem__
, it is impossible to know from the outside how it implements it. A Sequence
subtype should raise an error, if we were to try and call its __getitem__
with a string for example. But how con we know that it does? Only by calling it.
And since specifically the sequence protocol (and not just any __getitem__
method) is what constitutes an "iterable" in the absence of __iter__
in this broader sense, there is no way to determine, if a class should or should not be considered iterable.
To top this all off, consider the following example:
class Bar:
def __getitem__(self, key: str) -> str:
return key.upper()
it = iter(Bar()) # passes
print(next(it)) # AttributeError: 'int' object has no attribute 'upper'
I would argue that Bar
is a perfectly valid (albeit not very useful) example of a subscriptable class. An instance even passes the iter()
check! Yet should it be considered an iterable? Both the documentation and common sense say no.
Determining whether or not something is "iterable" comes down to what you mean by the term. And I would argue that (if anything) the documentation suggesting that the iter()
is reliable in this regard is misleading. The simple subclass check with the ABC Iterable
is not sufficient, if you consider the sequence protocol to also be a reasonable version of an iterable.
IMHO, the only actually reliable way of determining if an object is iterable is to chain a next()
call with an iter()
call, which in practice amounts to a plain for
-loop. If that raises an error, the object is not iterable.
Final example:
from __future__ import annotations
class RealIter:
def __iter__(self) -> RealIter:
print(f"called {self.__class__.__name__}.__iter__")
return self
def __next__(self) -> str:
print(f"called {self.__class__.__name__}.__next__")
return "Hi, mom!"
class SeqIter:
def __getitem__(self, key: int) -> str:
print(f"called {self.__class__.__name__}.__getitem__({key})")
return "Hi, mom!"
for item in RealIter():
print(item)
break
for item in SeqIter():
print(item)
break
Output:
called RealIter.__iter__
called RealIter.__next__
Hi, mom!
called SeqIter.__getitem__(0)
Hi, mom!