I am using phonetic matching for different words in Java. i used Soundex but its too crude. i switched to Metaphone and realized it was better. However, when i rigorously tested it. i found weird behaviour. i was to ask whether thats the way metaphone works or am i using it in wrong way. In following example its works fine:-
Metaphone meta = new Metaphone();
if (meta.isMetaphoneEqual("cricket","criket")) System.out.prinlnt("Match 1");
if (meta.isMetaphoneEqual("cricket","criketgame")) System.out.prinlnt("Match 2");
This would Print
Match 1
Mathc 2
Now "cricket" does sound like "criket" but how come "cricket" and "criketgame" are the same. If some one would explain this. it would be of great help.
Your usage is slightly incorrect. A quick investigation of the encoded strings and default maximum code length shows that it is 4, which truncates the end of the longer "criketgame":
System.out.println(meta.getMaxCodeLen());
System.out.println(meta.encode("cricket"));
System.out.println(meta.encode("criket"));
System.out.println(meta.encode("criketgame"));
Output (note "criketgame" is truncated from "KRKTKM" to "KRKT", which matches "cricket"):
4 KRKT KRKT KRKT
meta.setMaxCodeLen(8);
System.out.println(meta.encode("cricket"));
System.out.println(meta.encode("criket"));
System.out.println(meta.encode("criketgame"));
Now outputs:
KRKT KRKT KRKTKM
And now your original test gives the expected results:
Metaphone meta = new Metaphone();
meta.setMaxCodeLen(8);
System.out.println(meta.isMetaphoneEqual("cricket","criket"));
System.out.println(meta.isMetaphoneEqual("cricket","criketgame"));
Printing:
true false
As an aside, you may also want to experiment with DoubleMetaphone
, which is an improved version of the algorithm.
By the way, note the caveat from the documentation regarding thread-safety:
The instance field
maxCodeLen
is mutable but is not volatile, and accesses are not synchronized. If an instance of the class is shared between threads, the caller needs to ensure that suitable synchronization is used to ensure safe publication of the value between threads, and must not invokesetMaxCodeLen(int)
after initial setup.