pythonheapq

Why is this python priority queue failing to heapify?


Why is this priority queue failing to heapify? Where (150, 200, 200) are the priority values assigned to the dictionaries

import heapq

priority_q = [
    (150, {'intel-labels': {'timestamp': 150}}), 
    (200, {'intel-labels': {'timestamp': 200}}), 
    (200, {'intel-labels': {'timestamp': 200, 'xx': 'xx'}})
]

heapq.heapify(priority_q)

print( heapq.nlargest(2, priority_q))

The exception:

Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   TypeError: '<' not supported between instances of 'dict' and 'dict'

The below, however, works..

priority_q = [
    (150, {'intel-labels': {'timestamp': 150}}), 
    (200, {'intel-labels': {'timestamp': 200}}), 
    (201, {'intel-labels': {'timestamp': 200, 'xx': 'xx'}})
]

heapq.heapify(priority_q)

Why is this?


Solution

  • Why is this?

    heapq heapifies the heap according to comparisons. Tuples are compared lexicographically. If your first values in all tuples are distinct, you will never compare the second values. This is the case for your second example. If you have a duplicate value (200 in your first example), the second elements will be compared. These are dicts (which can't be compared), so this raises an error.

    As for a proper fix: The heapq docs got you covered. Their suggestion is to use triples such that the second value is an autoincremented tie breaker, or to use a data class that only compares the priority field.

    Note that these approaches differ slightly: With a tie breaker, there won't be any ties; nlargest will always give you just a single element - the one which was inserted first. If you want it to return all tying elements, you shouldn't use this approach.

    Applying the second approach to your example, the following works:

    import heapq
    from dataclasses import dataclass, field
    from typing import Any
    
    @dataclass(order=True)
    class PrioritizedItem:
        priority: int
        item: Any=field(compare=False)
    
    priority_q = [
        PrioritizedItem(150, {'intel-labels': {'timestamp': 150}}), 
        PrioritizedItem(200, {'intel-labels': {'timestamp': 200}}), 
        PrioritizedItem(200, {'intel-labels': {'timestamp': 200, 'xx': 'xx'}})
    ]
    heapq.heapify(priority_q)
    print(heapq.nlargest(2, priority_q))
    

    prints

    [PrioritizedItem(priority=200, item={'intel-labels': {'timestamp': 200}}), PrioritizedItem(priority=200, item={'intel-labels': {'timestamp': 200, 'xx': 'xx'}})]