javaindexingsolrlucenesolrcloud

Search result difference for the same Tokenizers from solr 5 to 8


I have indexed records which contains a filed called birth-date , its not a stored field and also not a date field , it is a text field (solr.TextField) , with a "standard Tokenizers" . In solr 5 when I did a search query

q=*:*&fq=birth_date:1989/01/01

I got filtered 33 odd records but when I am doing the same in solr 8 (with the same property ) ,I get more than 6000 results .

Below is the schema of the field

<fieldtype name='birth_date' class='solr.TextField' sortMissingLast='true' omitNorms='true'\>
   <analyzer\>
      <tokenizer class='solr.StandardTokenizerFactory'/\>
   </analyzer\>
</fieldtype\>

<field name='birth_date' type='birth_date' indexed='true' stored='false' multiValued='false' required='false'/\>

From solr 5 to 8 I don't see any change in solr.StandardTokenizerFactory but I did notice default "similarity" has changed , wanted to know why the search not giving same output

tied to hit q=*:*&fq=birth_date:1989/01/01 , we should get same number of response in solr 5 and solr 8


Solution

  • After debugging the input query saw that in solr5 the query searched was performing filter

    "parsed_filter_queries": [
      "PhraseQuery(birth_date:\"1989 01 01\")"
    ]
    

    But in solr 8 it was searching as

    "parsed_filter_queries":["birth_date:1989 birth_date:01 birth_date:01"]

    only after adding double quotes in the fq it changed to phrase query

    Another workaround was to use

    <fieldtype name="birth_date" class="solr.TextField" sortMissingLast="true" omitNorms="true">
       <analyzer type="index">
          <tokenizer class="solr.KeywordTokenizerFactory" />
          <filter class="solr.PatternReplaceFilterFactory" pattern="([^0-9])" replacement="" replace="all" />
       </analyzer>
       <analyzer type="query">
          <tokenizer class="solr.KeywordTokenizerFactory" />
          <filter class="solr.PatternReplaceFilterFactory" pattern="([^0-9])" replacement="" replace="all" />
       </analyzer>
    </fieldtype>
    

    where the query eliminates all special character except numbers