pythonlistfunctionsortingdata-structures

How to get top 3 tuples from two lists based on a sorting by float and string values


I have 2 lists

lst1 = [
    0.9932, 0.9982, 0.9979, 0.9981, 0.9993, 0.9985,
    0.9924, 0.9984, 0.9987, 0.9967, 0.995, 0.9932
]

and

lst2 = [
    "Jane", "Tylor", "James", "Tom", "Smith", "Johnson", 
    "Williams", "Jones", "Brown", "Davis", "Miller", "Wilson"
]

I want to make a list which contains 3 tuples which then contain 2 items within them. The second item in each of the 3 tuples is the float point value from lst1 arranged in descending order (top 3 values from lst1) and the second item in the tuple is the name from lst2 corresponding to the float point from lst1. So essentially I want

[("Smith", 0.9993), ("Brown", 0.9987), (0.9985, "Johnson")]

However, if the highest values from lst1 are the same, then I want to sort by the name. For example: (0.9994, "Abby") should come before (0.9994, "Bob")

I have tried:

sorted(zip(lst1, lst2), reverse=True)[:3]

which gives me

[(0.9993, "Smith"), (0.9987, "Brown"), ("Johnson", 0.9985)]

i.e. the lst1 item comes before lst2 item in the tuple. If I swap lst1 and lst2, it still does not work.


Solution

  • Change the order of lst1 and lst2 in zip and pass a sorting key to sorted. The default sorting is ascending sort, so to sort by descending order, we need to negate the float value.

    lst1 = [
        0.9932, 0.9982, 0.9979, 0.9981, 0.9993, 0.9985,
        0.9924, 0.9984, 0.9987, 0.9967, 0.995, 0.9932
    ]
    lst2 = [
        "Jane", "Tylor", "James", "Tom", "Smith", "Johnson", 
        "Williams", "Jones", "Brown", "Davis", "Miller", "Wilson"
    ]
    
    # sort in descending order by score, alphabetically by name
    result = sorted(zip(lst2, lst1), key=lambda x: (-x[1], x[0]))[:3]
    

    Note that if the lists are much larger than 3 (the desired number of output), then it might be more efficient to use heapq.nsmallest from the standard library because it keeps track of only 3 values. The sorting key is the same.

    import heapq
    result = heapq.nsmallest(3, zip(lst2, lst1), key=lambda x: (-x[1], x[0]))
    

    Both result in the following output:

    [('Smith', 0.9993), ('Brown', 0.9987), ('Johnson', 0.9985)]