I've been searching but haven't found how to do this in refine.
I've got two columns of unique IDS. For each a in A, I want to find the top 10 closest matches in B.
My backup plan is to just use Levenshtein to iterate ... but Refine has such a nice iterface and many more algorithms implemented that I was hoping to be able to do some of the work using it.
Or is there another tool for doing this?
Did you know you can use clustering algorithm like fingerprint or ngramFingerprint (source) out of the clustering interface in Refine?
Using you IDS field, create a new column based on this column with the following expression: ngramFingerprint(value)
You can now cross with your other data set on this new column. This might help to get more matches.