pythonfuzzy

Fuzzy Match values to list of list python


Struggling with how to do this in a pythonic way. I have a list of list which we can call names

[('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh' "Laurie')]

And then I have a two variables

First_name = 'Jimm'

Last_name = 'Smitn'

I want to iterate through this list of list, of first and last names to fuzzy match these values and return the list that is the closest to the specified First_name and Last_name


Solution

  • You can implement fuzzy matching obtaining best match ratio (using max()) returned by difflib.SequenceMatcher().

    To implement this we should pass lambda as key argument which will return match ratio. In my example I'd use SequenceMatcher.ratio(), but if performance is important you should also try with SequenceMatcher.quick_ratio() and SequenceMatcher.real_quick_ratio().

    from difflib import SequenceMatcher
    
    lst = [('Jimmy', 'Smith'), ('James', 'Wilson'), ('Hugh', 'Laurie')]
    first_name = 'Jimm'
    last_name = 'Smitn'
    
    matcher = SequenceMatcher(a=first_name + ' ' + last_name)
    match_first_name, match_last_name = max(lst,
        key=lambda x: matcher.set_seq2(' '.join(x)) or matcher.ratio())
    
    print(first_name, last_name, '-', match_first_name, match_last_name)