solrspell-checking

Solr file-based spellchecking does not detect correctly spelled words


I have a working Solr v9.4.1 installation, to which I want to add file-based spellchecking.

I want to add file-based spellchecking so that I can identify search terms that are correctly spelled, even if they don't appear in the main index of searchable documents.

Following the Apache Solr documentation, I have set up file-based spellchecking, and a request handler endpoint referring to the defined file-based spellchecking:

    <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
        <lst name="spellchecker">
            <str name="classname">solr.FileBasedSpellChecker</str>
            <str name="name">filebased</str>
            <str name="sourceLocation">dictionary.txt</str>
            <str name="characterEncoding">UTF-8</str>
            <str name="spellcheckIndexDir">./spellcheckerFile</str>
        </lst>
    </searchComponent>

    ...

    <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
        <lst name="defaults">
            <str name="df">suggest</str>
            <str name="spellcheck.dictionary">filebased</str>
            <str name="spellcheck.extendedResults">true</str>
            <str name="spellcheck.count">3</str>
            <str name="spellcheck.maxResultsForSuggest">0</str>
        </lst>
        <arr name="last-components">
            <str>spellcheck</str>
        </arr>
    </requestHandler

However, every query to this new request handler endpoint results in a false-negative of : ... "correctlySpelled":false ...

This even occurs when supplying as a query term one of the suggestions for another term. For example, "ablation" and "oblation" are each offered as a suggested correct spellings for the other, yet both are deemed incorrectly spelled when used as the query term.

Even with these results, I'm certain that Solr is in fact referring to the dictionary file I have defined, because words like "ablation" and "oblation" do not apper anywhere in the main index of searchable documents, so the only way that Solr can know about them to offer them as suggestions is by reading them from the specified dictionary file.

Is anyone successfully using file-based spellchecking?

Are there specific steps that need to be taken which aren't detailed in and/or perfectly clear in the Solr documentation?

Can someone share their Solr config for a working file-based spellchecking?

(Alternatively, is there another way to accomplish the primary goal : "identify search terms that are correctly spelled, even if they don't appear in the main index of searchable documents" ?)


Solution

  • With no direct resolution to this problem, I decided to instead add the dictionary terms to the main index.

    I first added the dictionary terms to a new table in our database, this database being the same source of data for the main corpus of searchable material. (In this database, the added data is a tiny amount compared to the regular data.) I then indexed the dictionary terms with (a) fields that would cause them to be found as results and (b) a field to distinguish these dictionary term results from the regular results. Finally, I added logic to the calling code of the application which queries Solr, to be able to detect when a given result-set consists only of these dictionary terms. This lets me know when the user has supplied a correctly-spelled word, but which also does not match any of the regular materials.