pythonpandasnumpydata-sciencefuzzy

How to return most common name from dataframe?


I am working on fuzzy matching two dataframes using fuzzywuzzy. I set a cutoff score of 75, using process.extractOne to get the highest match.

Whenever a match is not made the value for that row is 'None'.

How do I replace 'None' with the most common name?

from fuzzywuzzy import process

df1['Matched_Nickname_and_Score'] = df1['FNAME'].apply(lambda x: 
process.extractOne(x, df2['NICKNAME'].tolist(), score_cutoff = 75))

I have a way of finding the max value for each row, but not sure where to go from here

maxValuesObj = df1.max(axis = 1)

Solution

  • Here is something that might help:

    df1['Matched_Nickname_and_Score'] = df1['Matched_Nickname_and_Score'].fillna(value=df1.FNAME.mode().values[0])
    

    df1.FNAME.mode().values[0] will get the most common name from the column FNAME of the df1 dataframe. You just need to use fillna with that value and you'll get what you are looking for.