I am using the fuzzywuzzy library to match strings in a reference list using Levenshtein Distance. I want to apply this function to a series, matching each value of the series to a value in a reference list, if the value of the series matches the value in the reference list at a defined ratio, it either returns the value in the series (original) or the value in the reference list.
The function looks like this:
from fuzzywuzzy import fuzz
ref_list = ['SOBEYS', 'WHOLE FOODS', 'FOODLAND', 'LOBLAWS', 'SAFEWAY']
def clean(row, ref_list):
for ref in ref_list:
simil = fuzz.ratio(row, ref)
if (simil > 35):
return ref
elif (simil < 25):
return row
I created this test dataframe and it works fine. But I get the TypeError: object of type 'float' has no len()
when I apply it to the whole dataset.
I can't figure out why it works in the sample dataset I created and not in the whole (original) dataset.
Any help is appreciated. Thank you in advance!
lis = ['FOODLAND',
'THORNBURY FOODLAND',
'JOANNE S PLACE NO WED DEL',
'SOBEYS',
'SOBEYS',
'SOBEYS',
'SOBEYS',
'SOBEYS',
'SOBEYS TIMBERLEA',
'SOBEYS']
data = pd.DataFrame(lis, columns=['retailer'])
data['match'] = data['retailer'].apply(lambda x: clean(x, ref_list))
The error seems pretty self explanatory. Here's a way to reproduce it:
# sample data
f = pd.DataFrame({'col': ['SOBEYS ABC', 2.0]})
f['col'].apply(lambda x: fuzz.ratio(x, 'ABC'))
43 @functools.wraps(func)
44 def decorator(*args, **kwargs):
---> 45 if len(args[0]) == 0 or len(args[1]) == 0:
46 return 0
47 return func(*args, **kwargs)
TypeError: object of type 'float' has no len()
Basically, your column as a float value. A way to fix it is by converting it to str
:
data['match'] = data['retailer'].astype(str).apply(lambda x: clean(x, ref_list))