I want to find the similarity between the two strings Example
string1 = "One"
string2 = "one"
And I expect the answer to be between 0 and 1. For the above two strings, we get 1. Right now I'm using "Jellyfish", a module in python which has the jaro_distance() function. But the downside is I'm only able to compare two strings that contain only English words and other special characters. But I want to compare two strings in other languages, say Punjabi
string1 = "ਬੁੱਧਵਾਰ"
string2 = "ਬੁੱਧਵਾ"
I tried the same jaro_distance() function, but I'm getting
>>score = jellyfish.jaro_distance(unicode(string1), unicode(string2))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)
I tried to encode and decode them, before feeding them to the function. Is there any way to use jaro_distance() for other languages or is there any other module/functions available for this? Can you guys help me with this?
You can use a SequenceMatcher
from the built-in module difflib
Code example:
import difflib
print(difflib.SequenceMatcher(None, "ਬੁੱਧਵਾਰ", "ਬੁੱਧਵਾ").ratio())
Output:
0.9230769230769231