pythonlistperformancesortingheap

Efficient list sorting: Using heap instead of standard sorting is slower


I'm trying to create a more efficient way to sort lists and dictionaries in python and came across Efficient data structure keeping objects sorted on multiple keys. There the suggested solution was to use the heapq module.

However, in my tests a heap seems to be twice as slow as the native Python sorting algorithm. Below is the code I used to do a simple test. Results were for example:

Heap: 0.005993366241455078

Standard: 0.0020036697387695312

Is there a way to actually use a heap and increase the performance, as the post linked above claims? What would that code look like?

Here's the code to test it:

import  random
import time
from heapq import *

standardlist = []
heaplist = []
for i in range(10000):
    num = random.randint(0,10000)
    standardlist.append(num)
    heappush(heaplist, num)

# Standard sorting method:
start_time = time.time()
sorted_list = sorted(standardlist)
finish_time_1 = time.time() - start_time

# Heap sorting method:
start_time = time.time()
heap_sorted_list = [heappop(heaplist) for i in range(len(heaplist))]
finish_time_2 = time.time() - start_time

print("Standard Finish Time:", finish_time_1)
print("Heap Finish Time:", finish_time_2)

Solution

  • A heap data structure may be the right solution when you have a collection that changes over time (through insertions and deletions), and at each moment you want to have fast access to the minimum entry that is currently in the collection, and possibly extract it. There were such requirements in the Q&A you linked to.

    If however the goal is just to sort a dataset once, then using a heap is not the most efficient.

    Some remarks on your code: