pythondata-structureslogicheap

Finding the K-th largest element using heap


I am trying to solve the leetcode problem: kth-largest-element-in-an-array

I know a way to solve this is by using a heap. However, I wanted to implement my own heapify method for practice, and here is my code:

def findKthLargest(self, nums: List[int], k: int) -> int:
    
    def heapify(nums: List[int], i: int):
        print(nums, i)
        
        largest = i
        left = (2 * i) + 1
        right = (2 * i) + 2
        
        if left < len(nums) and nums[largest] < nums[left]:
            largest = left
        if right < len(nums) and nums[largest] < nums[right]:
            largest = right
        
        if largest != i:
            nums[i], nums[largest] = nums[largest], nums[i]
            print(nums)
            heapify(nums, largest)
    
    print(nums)
    for i in range(len(nums)-1, -1, -1):
        heapify(nums, i)
    print(nums)
    
    return nums[k-1]

My code is basically following the implementation given in an editorial:

    def max_heapify(heap_size, index):
        left, right = 2 * index + 1, 2 * index + 2
        largest = index
        if left < heap_size and lst[left] > lst[largest]:
            largest = left
        if right < heap_size and lst[right] > lst[largest]:
            largest = right
        if largest != index:
            lst[index], lst[largest] = lst[largest], lst[index]
            max_heapify(heap_size, largest)

    # heapify original lst
    for i in range(len(lst) // 2 - 1, -1, -1):
        max_heapify(len(lst), i)

And this worked for 21/41 test cases and is failing Input:

nums = [3,2,3,1,2,4,5,5,6]
k = 4

My code is returning 3 instead of 4. Here is my output:

[3, 2, 3, 1, 2, 4, 5, 5, 6]
[3, 2, 3, 1, 2, 4, 5, 5, 6] 8
[3, 2, 3, 1, 2, 4, 5, 5, 6] 7
[3, 2, 3, 1, 2, 4, 5, 5, 6] 6
[3, 2, 3, 1, 2, 4, 5, 5, 6] 5
[3, 2, 3, 1, 2, 4, 5, 5, 6] 4
[3, 2, 3, 1, 2, 4, 5, 5, 6] 3
[3, 2, 3, 6, 2, 4, 5, 5, 1]
[3, 2, 3, 6, 2, 4, 5, 5, 1] 8
[3, 2, 3, 6, 2, 4, 5, 5, 1] 2
[3, 2, 5, 6, 2, 4, 3, 5, 1]
[3, 2, 5, 6, 2, 4, 3, 5, 1] 6
[3, 2, 5, 6, 2, 4, 3, 5, 1] 1
[3, 6, 5, 2, 2, 4, 3, 5, 1]
[3, 6, 5, 2, 2, 4, 3, 5, 1] 3
[3, 6, 5, 5, 2, 4, 3, 2, 1]
[3, 6, 5, 5, 2, 4, 3, 2, 1] 7
[3, 6, 5, 5, 2, 4, 3, 2, 1] 0
[6, 3, 5, 5, 2, 4, 3, 2, 1]
[6, 3, 5, 5, 2, 4, 3, 2, 1] 1
[6, 5, 5, 3, 2, 4, 3, 2, 1]
[6, 5, 5, 3, 2, 4, 3, 2, 1] 3
[6, 5, 5, 3, 2, 4, 3, 2, 1]

I see that 4 in index 5 is never being sorted after the initial few iterations. Why is this happening? What am I missing? Any help would be appreciated.


Solution

  • There are a few problems with your (partial) heap-sort algorithm:

    1. The loop should start from the last non-leaf node (a node with at least one child) and move to the root. In your code, the loop starts from the last element.

      • Starting from the last element won't help in building the heap correctly because leaf nodes (which include the last elements) already satisfy the heap property and don’t need to be heapified.
    2. After building the max heap, the largest element is at the root. We then:

      1. Swap it with the last element in the heap.

      2. Reduce the heap size by one and restore the heap by calling heapify on the root node.

      3. Repeat this process k-1 times.

    After that, the kth largest element is at the root of the heap:

    def findKthLargest(self, nums: List[int], k: int) -> int:
        # added heap_size parameter
        def heapify(nums: List[int], i: int, heap_size: int):
            print(nums, i)
    
            largest = i
            left = (2 * i) + 1
            right = (2 * i) + 2
    
            if left < heap_size and nums[largest] < nums[left]:
                largest = left
            if right < heap_size and nums[largest] < nums[right]:
                largest = right
    
            if largest != i:
                nums[i], nums[largest] = nums[largest], nums[i]
                print(nums)
                heapify(nums, largest, heap_size)
    
        print(nums)
        # starting from the last non-leaf node
        for i in range(len(nums) // 2 - 1, -1, -1):
            heapify(nums, i, len(nums))
        print(nums)
    
        # performing k - 1 swaps
        for i in range(len(nums) - 1, len(nums) - k, -1):
            nums[0], nums[i] = nums[i], nums[0]
            heapify(nums, 0, i)  # heapify the reduced heap
    
        return nums[0]  # kth largest element is at the root