pythonfuzzywuzzy

Fuzzy calculation?


I would like to know how to enable the fuzzy evaluation/calculation. I found that scikit-fuzzy might be useful. But I can't find the consistent fuzzy matrix function. I assume that there will be some data platform or python code that can implement this automatically. Can anybody help me?


Solution

  • The code that I use is apart of the RapidFuzz package which also computes string similarity. Heres a link that might be helpful:

    https://maxbachmann.github.io/RapidFuzz/Usage/process.html

    The code that I use to generate a matrix is this when I am comparing one column of strings to itself:

    strings1= df['usernames']
    C = process.cdist(strings1, strings1, scorer=fuzz.ratio, workers = -1)
    
    

    Output:

    array([[100.      ,  22.222221,  19.047619, ...,  21.052631,  26.666666,
             11.764706],
           [ 22.222221, 100.      ,  21.052631, ...,  23.529411,  15.384615,
             13.333333],
           [ 19.047619,  21.052631, 100.      , ...,  30.      ,  12.5     ,
             22.222221],
           ...,
           [ 21.052631,  23.529411,  30.      , ..., 100.      ,  14.285714,
             25.      ],
           [ 26.666666,  15.384615,  12.5     , ...,  14.285714, 100.      ,
             33.333332],
           [ 11.764706,  13.333333,  22.222221, ...,  25.      ,  33.333332,
            100.      ]], dtype=float32)
    

    This also is a lot faster than using Fuzzy Wuzzy since RapidFuzz was developed in C. Hope this helps