pythonpython-typinggeneric-collections

Difference between collections.abc.Sequence and typing.Sequence


I was reading an article and about collection.abc and typing class in the python standard library and discover both classes have the same features.

I tried both options using the code below and got the same results

from collections.abc import Sequence

def average(sequence: Sequence):
    return sum(sequence) / len(sequence)

print(average([1, 2, 3, 4, 5]))  # result is 3.0

from typing import Sequence

def average(sequence: Sequence):
    return sum(sequence) / len(sequence)

print(average([1, 2, 3, 4, 5])) # result is 3.0


Under what condition will collection.abc become a better option to typing. Are there benefits of using one over the other?


Solution

  • Good on you for using type annotations! As the documentations says, if you are on Python 3.9+, you should most likely never use typing.Sequence due to its deprecation. Since the introduction of generic alias types in 3.9 the collections.abc classes all support subscripting and should be recognized correctly by static type checkers of all flavors.

    So the benefit of using collections.abc.T over typing.T is mainly that the latter is deprecated and should not be used.

    As mentioned by jsbueno in his answer, annotations will have no runtime implications either way, unless of course they are explicitly picked up by a piece of code (see my other answer here). They are just an essential part of good coding style. But your function would still work, i.e. your script would execute without error, even if you annotated your function with something absurd like def average(sequence: 4%3): ....


    Proper annotations are still extremely valuable. Thus, I would recommend you get used to some of the best practices as soon as possible. (A more-or-less strict static type checker like is very helpful for that.) For one thing, when you are using generic types like Sequence, you should always provide the appropriate type arguments. Those may be type variables, if your function is also generic or they may be concrete types, but you should always include them.

    In your case, assuming you expect the contents of your sequence to be something that can be added with the same type and divided by an integer, you might want to e.g. annotate it as Sequence[float]. (In the Python type system, float is considered a supertype of int, even though there is no nominal inheritance.)

    Another recommendation is to try and be as broad as possible in the parameter types. (This echoes the Python paradigm of dynamic typing.) The idea is that you just specify that the object you expect must be able to "quack", but you don't say it must be a duck.

    In your example, since you are reliant on the argument being compatible with sum as well as with len, you should consider what types those functions expect. The len function is simple, since it basically just calls the __len__ method of the object you pass to it. The sum function is more nuanced, but in your case the relevant part is that it expects an iterable of elements that can be added (e.g. float).

    If you take a look at the collections ABCs, you'll notice that Sequence actually offers much more than you need, being that it is a reversible collection. A Collection is the broadest built-in type that fulfills your requirements because it has __iter__ (from Iterable) and __len__ (from Sized). So you could do this instead:

    from collections.abc import Collection
    
    
    def average(numbers: Collection[float]) -> float:
        return sum(numbers) / len(numbers)
    

    (By the way, the parameter name should not reflect its type.)

    Lastly, if you wanted to go all out and be as broad as possible, you could define your own protocol that is even broader than Collection (by getting rid of the Container inheritance):

    from collections.abc import Iterable, Sized
    from typing import Protocol, TypeVar
    
    
    T = TypeVar("T", covariant=True)
    
    
    class SizedIterable(Sized, Iterable[T], Protocol[T]):
        ...  # Literal ellipsis, not a placeholder
    
    
    def average(numbers: SizedIterable[float]) -> float:
        return sum(numbers) / len(numbers)
    

    This has the advantage of supporting very broad structural subtyping, but is most likely overkill.

    (For the basics of Python typing, PEP 483 and PEP 484 are a must-read.)