pythonpandasfuzzywuzzy

apply defined function to column pandas and fuzzywuzzy


I am using the fuzzywuzzy library to match strings in a reference list using Levenshtein Distance. I want to apply this function to a series, matching each value of the series to a value in a reference list, if the value of the series matches the value in the reference list at a defined ratio, it either returns the value in the series (original) or the value in the reference list.

The function looks like this:

from fuzzywuzzy import fuzz

ref_list = ['SOBEYS', 'WHOLE FOODS', 'FOODLAND', 'LOBLAWS', 'SAFEWAY']

def clean(row, ref_list):
    for ref in ref_list:
        simil = fuzz.ratio(row, ref)
        if (simil > 35):
            return ref
        elif (simil < 25):
            return row

I created this test dataframe and it works fine. But I get the TypeError: object of type 'float' has no len() when I apply it to the whole dataset.

I can't figure out why it works in the sample dataset I created and not in the whole (original) dataset.

Any help is appreciated. Thank you in advance!

lis = ['FOODLAND',
 'THORNBURY FOODLAND',
 'JOANNE S PLACE NO WED DEL',
 'SOBEYS',
 'SOBEYS',
 'SOBEYS',
 'SOBEYS',
 'SOBEYS',
 'SOBEYS TIMBERLEA',
 'SOBEYS']


data = pd.DataFrame(lis, columns=['retailer'])

data['match'] = data['retailer'].apply(lambda x: clean(x, ref_list))

enter image description here


Solution

  • The error seems pretty self explanatory. Here's a way to reproduce it:

    # sample data
    f = pd.DataFrame({'col': ['SOBEYS ABC', 2.0]})
    f['col'].apply(lambda x: fuzz.ratio(x, 'ABC'))
    
         43 @functools.wraps(func)
         44 def decorator(*args, **kwargs):
    ---> 45     if len(args[0]) == 0 or len(args[1]) == 0:
         46         return 0
         47     return func(*args, **kwargs)
    
    TypeError: object of type 'float' has no len()
    
    

    Basically, your column as a float value. A way to fix it is by converting it to str:

    data['match'] = data['retailer'].astype(str).apply(lambda x: clean(x, ref_list))