javasolrlucenestemmingfast-vector-highlighter

FastVectorHighlighter phrase highlighting not working with stemming


Attempting to write a stemmer in storing both the stem and the original word have caused some problems with getting phrases to highlight using the FastVectorHighlighter.

The input string is foo bar baz with ba being the stem of bar. Below image illustrates the analysis

enter image description here

A phrase search yields a match but no highlight at all

http://localhost:8080/solr/select
   ?q="foo bar baz"
   &qf=text
   &hl.requireFieldMatch=true
   &hl.fl=text
   &hl.usePhraseHighlighter=true
   &hl.boundaryScanner=breakIterator
   &hl.useFastVectorHighlighter=true
   &hl=true
   &defType=edismax

Where hl.bs.type=WORD is used by the boundayScanner.

Both approaches, hl.useFastVectorHighlighter=false and quotes from the query, results in highlighting for all terms.

Solr 3.6.2 is being used, and the field is defined below

<field name="text" type="text" indexed="true" stored="true" 
     multiValued="true" termVectors="true" 
     termPositions="true" termOffsets="true"/>

And analyzed as

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="my.custom.StemmerFactory" preserveOriginal="true"/>
    </analyzer>
</fieldType>

Solution

  • Turns out hl.fragSize wasn't set to a large enough value to include the entire highlighted sequence. The silly problems are often the worst.