metaphonesoundexalgorithm

Double-metaphone errors


I'm using Lawrence Philips Double-Metaphone algorithm with great success, but I have found the odd "unexpected result" for some combinations.

Does anyone else have additions or changes to the algorithm for other parts of it they wouldn't mind sharing, or just the combinations that they've found that do not work as expected.

eg. I had issues between:


Solution

  • All Soundex, Metaphone and variant schemes are occasionally going to give results that aren't identical to what you expect. This is unavoidable - they can be regarded as more or less simple hash algorithms with special information preserving properties, and will sometimes produce collisions when you'd rather they didn't, and will sometimes produce differences when you'd rather they didn't.

    One possible way of improving things is using 'synonym rings'. This basically produces lists of words that should be regarded as synonyms, independent of the spelling. I encountered them in the context of name matching. For example, variants on Chaudri included:

    CHAUDARY CHAUDERI CHAUDERY CHAUDHARY CHAUDHERI CHAUDHERY CHAUDHRI CHAUDHRY CHAUDHURI CHAUDHURY CHAUDHY CHAUDREY CHAUDRI CHAUDRY CHAUDURI CHAWDHARY CHAWDHRY CHAWDHURY CHDRY CHODARY CHODHARI CHODHOURY CHODHRY CHODREY CHODRY CHODURY CHOUDARI CHOUDARY CHOUDERY CHOUDHARI CHOUDHARY CHOUDHERY CHOUDHOURY CHOUDHRI CHOUDHRY CHOUDHURI CHOUDHURY CHOUDREY CHOUDRI CHOUDRY CHOUDURY CHOUWDHRY CHOWDARI CHOWDARY CHOWDHARY CHOWDHERY CHOWDHRI CHOWDHRY CHOWDHURI CHOWDHURRYY CHOWDHURY CHOWDORY CHOWDRAY CHOWDREY CHOWDRI CHOWDRURY CHOWDRY CHOWDURI CHOWDURY CHUDARY CHUDHRY CHUDORY COWDHURY