I would like to know how to enable the fuzzy evaluation/calculation. I found that scikit-fuzzy might be useful. But I can't find the consistent fuzzy matrix function. I assume that there will be some data platform or python code that can implement this automatically. Can anybody help me?
The code that I use is apart of the RapidFuzz package which also computes string similarity. Heres a link that might be helpful:
https://maxbachmann.github.io/RapidFuzz/Usage/process.html
The code that I use to generate a matrix is this when I am comparing one column of strings to itself:
strings1= df['usernames']
C = process.cdist(strings1, strings1, scorer=fuzz.ratio, workers = -1)
Output:
array([[100. , 22.222221, 19.047619, ..., 21.052631, 26.666666,
11.764706],
[ 22.222221, 100. , 21.052631, ..., 23.529411, 15.384615,
13.333333],
[ 19.047619, 21.052631, 100. , ..., 30. , 12.5 ,
22.222221],
...,
[ 21.052631, 23.529411, 30. , ..., 100. , 14.285714,
25. ],
[ 26.666666, 15.384615, 12.5 , ..., 14.285714, 100. ,
33.333332],
[ 11.764706, 13.333333, 22.222221, ..., 25. , 33.333332,
100. ]], dtype=float32)
This also is a lot faster than using Fuzzy Wuzzy since RapidFuzz was developed in C. Hope this helps