pythonheap

Understanding how to create a heap in Python


The collections.Count.most_common function in Python uses the heapq module to return the count of the most common word in a file, for instance.

I have traced through the heapq.py file, but I'm having a bit of trouble understanding how a heap is created/updated with respect to words let's say.

So, I think the best way for me to understand it, is to figure out how to create a heap from scratch.

Can someone provide a pseudocode for creating a heap that would represent word count?


Solution

  • this is a slightly modified version of the code found here : http://code.activestate.com/recipes/577086-heap-sort/

    def HeapSort(A,T):
        def heapify(A):
            start = (len(A) - 2) / 2
            while start >= 0:
                siftDown(A, start, len(A) - 1)
                start -= 1
    
        def siftDown(A, start, end):
            root = start
            while root * 2 + 1 <= end:
                child = root * 2 + 1
                if child + 1 <= end and T.count(A[child]) < T.count(A[child + 1]):
                    child += 1
                if child <= end and T.count(A[root]) < T.count(A[child]):
                    A[root], A[child] = A[child], A[root]
                    root = child
                else:
                    return
    
        heapify(A)
        end = len(A) - 1
        while end > 0:
            A[end], A[0] = A[0], A[end]
            siftDown(A, 0, end - 1)
            end -= 1
    
    
    if __name__ == '__main__':
        text = "the quick brown fox jumped over the the quick brown quick log log"
        heap = list(set(text.split()))
        print heap
    
        HeapSort(heap,text)
        print heap
    

    Output

    ['brown', 'log', 'jumped', 'over', 'fox', 'quick', 'the']
    ['jumped', 'fox', 'over', 'brown', 'log', 'the', 'quick']
    

    you can visualize the program here