solrsolrj

Can we apply fuzzy for the words in solr synonym txt file


We have a requirement to search records matching with the synonym.For example if document is indexed, which have a field with values in different records worst and bad. I have a synonym.txt configured with the words bad, worst,dreadful as synonym terms and the field type of the file is text_general which is configured with the synonym filter in analyzer type query.

When user query with the dreadful which returns the records having bad and worst from the indexed document. When I search with fuzzy for dreadf~2, does not return any records, expecting the records with bad and worst. How could this can be archived. Can we implement customised code to achieve this requirement, since user may even enter typo for synonym.

Below is the field configuration in the schema.xml

<fieldType name="text_general" class="solr.TextField"
   positionIncrementGap="100" multiValued="true">
       <analyzer type="index">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
         <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
         <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>   </fieldType>

   <field name="emotion" type="text_general"/>

Solution

  • Thanks for the schema. If you enter dreadf~2 the synonyms won't get triggered because dreadf doesn't match anything.

    If you debug the query you'll see this for dreadful:

    +title:"(worst dreadful) bad"
    

    and this for dreadf~2:

    name="parsedquery_toString">+title:"dreadf 2"</str>
    

    You would probably need synonyms for dreadf or search for dreadful~2

    Note that this link discusses some of the shortcomings of query time synonym expansion: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory