Hi I have a field with the following schema,
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1" generateNumberParts="1" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.WordDelimiterFilterFactory" catenateNumbers="0" protected="protwords.txt" splitOnCaseChange="1" generateWordParts="0" preserveOriginal="1" catenateAll="0" catenateWords="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
I am storing complete pdf documents.
Now suppose I have 4 documents with the following content.
1. stackoverflow is a good site.
2. stack-overflow is a good site.
3. stack overflow is a good site.
4. stackoverflow2018 is a good site.
Now when I search stackoverflow
It should return me 1,
when I search stack-overflow
it should return me 2.
when I search stack overflow
it should return me 3.
when I search stackoverflow2018
it should return me 4.
what should the schema for it the schema not working in this case. Is there any thing I could specify in the query ?
A Word Delimiter Graph Filter will split on non-alphanumerics (-
), case changes, and numbers by default.
The rules for determining delimiters are determined as follows:
A change in case within a word: "CamelCase" -> "Camel", "Case". This can be disabled by setting splitOnCaseChange="0".
A transition from alpha to numeric characters or vice versa: "Gonzo5000" -> "Gonzo", "5000" "4500XL" -> "4500", "XL". This can be disabled by setting splitOnNumerics="0".
Non-alphanumeric characters (discarded): "hot-spot" -> "hot", "spot"
A trailing "'s" is removed: "O’Reilly’s" -> "O", "Reilly"
Any leading or trailing delimiters are discarded: "--hot-spot--" -> "hot", "spot"
If you don't want that behavior, remove the WordDelimiterFilter from your filter list and add other filters to support the part of the WDF behavior that you need.