solrlucenesolr4

Solr - No extra score for repeating words from query in document


I want to give score for a term match only once, not for a number of occurrence.

Ex - Search Query - Parle G Biscuits

Document 1 - Parle G Biscuits
Document 2 - Parle G Biscuits. I can eat 10 packets of Parle G Biscuits anytime. 
Document 3 - Parle G Biscuits V2 

I want to rank documents as Doc 1 > Doc 3 > Doc 2
Default answer from Solr - Doc 2 > Doc 1 > Doc 3

This is happening because the string is found twice in the longer string. If I could just somehow stop giving score for double occurrence, I will get the desired results because Document 2 and 3 will get slightly penalized for large string length.

How can I modify Solr to work in given fashion ?

Thanks !


Solution

  • If you don't need the term positions (for example if you're not using searching using phrases such as foo:"word1 word2"), you can set the field to drop any term frequency information, payloads and positions: omitTermFreqAndPositions="true".

    If true, omits term frequency, positions, and payloads from postings for this field. This can be a performance boost for fields that don't require that information. It also reduces the storage space required for the index. Queries that rely on position that are issued on a field with this option will silently fail to find documents. This property defaults to true for all field types that are not text fields.

    As there is no separate setting for just dropping term frequency, you'll have to implement a custom similarity if you need the other two features that the setting disables.