python-3.xcounterpyi

What's the underlying implementation for most_common method of Counter?


I found a pyi file which has the following def

def most_common(self, n: Optional[int] = ...) -> List[Tuple[_T, int]]: ...

How could this happen? List is not defined, and no implementation?


Just highlight some valuable suggestions here for followers:

List is imported from the typing module; it's not the same thing as list. The .pyi file doesn't need to import it because stub files are never executed; they just have to be syntactically valid Python

If you use from future import annotations, you won't have to import typing to use List et al. in function annotations in .py files, either, since function annotations will be treated as string literals. (Starting in Python 4, that will be the default behavior. See PEP 563 for details.)


Solution

  • You are looking at the pyi file which is used solely for annotations. It is never executed by the Python interpreter. You can learn more about pyi files by reading PEP484.

    Using a debugger, put a breakpoint on the line where you call most_commonand then step into the method.

    Python 3.7 implementation.

    ...\Lib\collections\__init__.py:

    def most_common(self, n=None):
        '''List the n most common elements and their counts from the most
        common to the least.  If n is None, then list all element counts.
    
        >>> Counter('abcdeabcdabcaba').most_common(3)
        [('a', 5), ('b', 4), ('c', 3)]
    
        '''
        # Emulate Bag.sortedByCount from Smalltalk
        if n is None:
            return sorted(self.items(), key=_itemgetter(1), reverse=True)
        return _heapq.nlargest(n, self.items(), key=_itemgetter(1))
    

    _heapq.nlargest (in ...\Lib\heapq.py) implementation:

    def nlargest(n, iterable, key=None):
        """Find the n largest elements in a dataset.
    
        Equivalent to:  sorted(iterable, key=key, reverse=True)[:n]
        """
    
        # Short-cut for n==1 is to use max()
        if n == 1:
            it = iter(iterable)
            sentinel = object()
            if key is None:
                result = max(it, default=sentinel)
            else:
                result = max(it, default=sentinel, key=key)
            return [] if result is sentinel else [result]
    
        # When n>=size, it's faster to use sorted()
        try:
            size = len(iterable)
        except (TypeError, AttributeError):
            pass
        else:
            if n >= size:
                return sorted(iterable, key=key, reverse=True)[:n]
    
        # When key is none, use simpler decoration
        if key is None:
            it = iter(iterable)
            result = [(elem, i) for i, elem in zip(range(0, -n, -1), it)]
            if not result:
                return result
            heapify(result)
            top = result[0][0]
            order = -n
            _heapreplace = heapreplace
            for elem in it:
                if top < elem:
                    _heapreplace(result, (elem, order))
                    top, _order = result[0]
                    order -= 1
            result.sort(reverse=True)
            return [elem for (elem, order) in result]
    
        # General case, slowest method
        it = iter(iterable)
        result = [(key(elem), i, elem) for i, elem in zip(range(0, -n, -1), it)]
        if not result:
            return result
        heapify(result)
        top = result[0][0]
        order = -n
        _heapreplace = heapreplace
        for elem in it:
            k = key(elem)
            if top < k:
                _heapreplace(result, (k, order, elem))
                top, _order, _elem = result[0]
                order -= 1
        result.sort(reverse=True)
        return [elem for (k, order, elem) in result]