Attempting to write a stemmer in storing both the stem and the original word have caused some problems with getting phrases to highlight using the FastVectorHighlighter
.
The input string is foo bar baz
with ba
being the stem of bar
. Below image illustrates the analysis
A phrase search yields a match but no highlight at all
http://localhost:8080/solr/select
?q="foo bar baz"
&qf=text
&hl.requireFieldMatch=true
&hl.fl=text
&hl.usePhraseHighlighter=true
&hl.boundaryScanner=breakIterator
&hl.useFastVectorHighlighter=true
&hl=true
&defType=edismax
Where hl.bs.type=WORD
is used by the boundayScanner
.
Both approaches, hl.useFastVectorHighlighter=false
and quotes from the query, results in highlighting for all terms.
Solr 3.6.2
is being used, and the field is defined below
<field name="text" type="text" indexed="true" stored="true"
multiValued="true" termVectors="true"
termPositions="true" termOffsets="true"/>
And analyzed as
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="my.custom.StemmerFactory" preserveOriginal="true"/>
</analyzer>
</fieldType>
Turns out hl.fragSize
wasn't set to a large enough value to include the entire highlighted sequence. The silly problems are often the worst.