solrautocompleteautosuggest

Is there a way in Solr to return true result counts with auto-suggestions?


I have a Solr v9.5 service set up to allow users to search a corpus of end-user content. I have looked into three ways to implement auto-suggestions which show the user the number of results they should expect for each suggestion. Each one has shortcomings:

Implementation A : During indexing I collect searchable fields into a 'suggestions' field, and have a <requestHandler> tuned to return values from that field. PRO: returns values that are extremely relevant to the actual collection of indexed documents. CON: To get the number of results that would be returned, for each suggestion, I issue a new, regular query, and parse out the numFound, adding a lot of time before I can show any suggestions at all.

Implementation B : Solr has a Suggester facility. PRO: Is specifically suited to auto-suggestions. CON: Nothing in the documentation suggests support for returning result counts for each returned suggestion.

Implementation C : Solr has a Terms Component facility, for fetching counts of documents which match input terms, which they state and demonstrate can be used for auto-suggestions. PRO: This is the closest to what I want, returning both matches and counts in one operation. CON: The counts do not correspond to the actual number of results when using those matched terms for a search query.

Further detail for Implementation C: If I submit this Terms query, where the terms.fl fields match the qf fields of an regular user search: ...

http://localhost:8983/solr/my_core/terms?wt=json&terms.limit=-1&terms.regex.flag=case_insensitive&terms.fl=title_list&terms.fl=topic_list&terms.fl=summary&terms.fl=category_name_list&terms.fl=genre&terms.fl=skill_list&terms.fl=skill_type_list&terms.fl=search_terms&terms.fl=landing_page_keyword&terms.fl=language_name&terms.regex=.*canada.*

... I receive the following Solr response: ...

{
  "responseHeader":{
    "status":0,
    "QTime":88},
  "terms":{
    "title_list":[
      "canada",12],
    "topic_list":[
      "canada",52],
    "summary":[
      "canada",32,
      "canada,",10,
      "canada.",10],
    "category_name_list":[],
    "genre":[],
    "skill_list":[],
    "skill_type_list":[],
    "search_terms":[
      "canada",7,
      "canada,",7],
    "landing_page_keyword":[],
    "language_name":[]}
}

The problem is that if I conduct a regular end-user search for "canada", I have 71 results. Totaling the document counts from the above response gives 130. (Even if I don't count the ones for "canada." and "canada," I have 103.)

I assume the disparity is because, e.g., a given document might have "canada" in both the title and summary, thus being counted twice by the straight documents-count which Terms Component does.

So, is there a way to have Solr return auto-suggestion terms along with the exact number of search results for each suggested term?

I have wide latitude with the service implementation, so if there is a way to do this which involves none of the above, specialized configuration, etc. I'm open to hearing about it.

Thank you.


Solution

  • I ended up with "Implementation A" from my question. The amount of added time turned out to be trivial, owing chiefly to how fast Solr / Lucene is, I'm guessing.

    This has proven to be quite sufficient under plenty of time serving production usage. And, in the event that it ever starts to be too slow, there are straight-forward caching options, to cache the result count for a given term : a majority of users' searches are for a fairly limited set of terms, and the result counts are stable for timeframes on the order of days or weeks.