[SOLVED] Fuzzy calculation?

Fuzzy calculation?

I would like to know how to enable the fuzzy evaluation/calculation. I found that scikit-fuzzy might be useful. But I can't find the consistent fuzzy matrix function. I assume that there will be some data platform or python code that can implement this automatically. Can anybody help me?

Solution

The code that I use is apart of the RapidFuzz package which also computes string similarity. Heres a link that might be helpful:

https://maxbachmann.github.io/RapidFuzz/Usage/process.html

The code that I use to generate a matrix is this when I am comparing one column of strings to itself:

strings1= df['usernames']
C = process.cdist(strings1, strings1, scorer=fuzz.ratio, workers = -1)

Output:

array([[100.      ,  22.222221,  19.047619, ...,  21.052631,  26.666666,
         11.764706],
       [ 22.222221, 100.      ,  21.052631, ...,  23.529411,  15.384615,
         13.333333],
       [ 19.047619,  21.052631, 100.      , ...,  30.      ,  12.5     ,
         22.222221],
       ...,
       [ 21.052631,  23.529411,  30.      , ..., 100.      ,  14.285714,
         25.      ],
       [ 26.666666,  15.384615,  12.5     , ...,  14.285714, 100.      ,
         33.333332],
       [ 11.764706,  13.333333,  22.222221, ...,  25.      ,  33.333332,
        100.      ]], dtype=float32)

This also is a lot faster than using Fuzzy Wuzzy since RapidFuzz was developed in C. Hope this helps