searchfuzzywuzzy

How to get a substring using fuzzywuzzy from a single long string (without spaces)


I am trying to perform a fuzzy search on a single long "word", using a target word using the process.extract from thefuzz (fuzzywuzzy). However, I am getting single letters/numbers, as opposed to the partial match (that is detected through fuzz.partial_ratio). How, do I get the piece of sequence, matching to what partial_ratio detects?

snippet:

from thefuzz import fuzz, process

target = "1234"
source = "012345"
print(fuzz.partial_ratio(target, source))
print(process.extract(target, source, limit=2))

The first print yields a partial_ratio of 100, clearly indicating the target, is inside the source. However, the extract returns;

[('1', 90), ('2', 90)]

as opposed to (what I was expecting);

[('1234', 100)]

What am I doing wrong and/or mis-interpreting?


Solution

  • To fix your issue, modify your last print line to the following:

    print(process.extract(target, [source], scorer=fuzz.partial_ratio))
    # [('012345', 100)]
    

    The reason your code doesn't work is twofold.

    1. source is supposed to be a list or a dict like object. This is why you get '1', '2', etc. because the function is iterating the string and getting the characters.
    2. process.extract uses a different scorer than partial_ratio by default (uses fuzz.WRatio)

    To solve 1), make source a list

    print(process.extract(target, [source], limit=2))
    # [('012345', 90)]
    

    Note that limit does nothing here, since it gives you more results if there are multiple source strings to search in, e.g.

    target = '1234'
    sources = ['012345', '324423', '0123567']
    print(process.extract(target, sources, scorer=fuzz.partial_ratio, limit=2))
    # [('012345', 100), ('0123567', 75)]
    

    To solve 2) and get your desired result

    print(process.extract(target, [source], scorer=fuzz.partial_ratio))
    # [('012345', 100)]
    

    Note you don't get '1234' here because the result is telling you which source string you compared to and found your target with this accuracy. It is not just giving you your target back like you thought it would.