I have a file containing some phrases. Using jarowinkler by lucene, it is supposed to get me the most similar phrases of my input from that file.
Here is an example of my problem.
We have a file containing:
//phrases.txt
this is goodd
this is good
this is god
If my input is this is good, it is supposed to get me 'this is good' from the file first, since the similarity score here is the biggest (1). But for some reason, it returns: "this is goodd" and "this is god" only!
Here is my code:
try {
SpellChecker spellChecker = new SpellChecker(new RAMDirectory(), new JaroWinklerDistance());
Dictionary dictionary = new PlainTextDictionary(new File("src/main/resources/words.txt").toPath());
IndexWriterConfig iwc=new IndexWriterConfig(new ShingleAnalyzerWrapper());
spellChecker.indexDictionary(dictionary,iwc,false);
String wordForSuggestions = "this is good";
int suggestionsNumber = 5;
String[] suggestions = spellChecker.suggestSimilar(wordForSuggestions, suggestionsNumber,0.8f);
if (suggestions!=null && suggestions.length>0) {
for (String word : suggestions) {
System.out.println("Did you mean:" + word);
}
}
else {
System.out.println("No suggestions found for word:"+wordForSuggestions);
}
} catch (IOException e) {
e.printStackTrace();
}
suggestSimilar
won't provide suggestions which are identical to the input. To quote the source code:
// don't suggest a word for itself, that would be silly
If you want to know whether wordForSuggestions
is in the dictionary, use the exist
method:
if (spellChecker.exist(wordForSuggestions)) {
//do what you want for an, apparently, correctly spelled word
}