pythonsortingheapq

Sorting heapq with second element in lexicographical order is failing


I have a list of tuples: [(5, 'i love'), (2, 'island'), (5, 'iz')].

I want to perform a two-step sorting process to extract a certain number of tuples (let's say n, which is 3 in this example) from the list:

  1. Sort based on the first element: Sort the tuples in descending order based on the value of the first element. This means tuples with higher first element values will come first.

  2. If the first element is the same: In case two or more tuples have the same first element value, sort these tuples based on the second element (string) in lexicographical (alphabetical) order.

Applying these rules to my example:

Sort based on the first element: (5, 'i love'), (5, 'iz'), (2, 'island'). Within the tuples having the same first element (5), sort based on the second element lexicographically: (5, 'i love'), (5, 'iz'). So, the final result after this sorting process should be: [(5, 'i love'), (5, 'iz'), (2, 'island')].

import heapq
ls = [(5, 'i love'), (2, 'island'), (5, 'iz')]
res = heapq.nlargest(3, ls, key = lambda x: (x[0], x[1]))
print(res)

Actual Output: [(5, 'iz'), (5, 'i love'), (2, 'island')] Expected Output: [(5, 'i love'), (5, 'iz'), (2, 'island')]


Solution

  • heapq.nlargest returns the results in descending order which means in this case when the first key is the same, it will return the larger second key first. Since you want the second key to be in lexicographical order (ascending order), and your first key is a number, you can use the following workaround:

    res = heapq.nsmallest(3, ls, key = lambda x: (-x[0], x[1]))
    

    Since now the first key is -ve of the original value, ascending order using heapq.nsmallest gives the correct output.