solrfuzzy-searchedismax

eDisMax parser issue running over multiple fields


Enviornment ==> solr - solr-8.9.0, java version "11.0.12" 2021-07-20 LTS

Following .csv file is indexed in solr

books_id,cat,name,price,inStock,author,series_t,sequence_i,genre_s
0553573403,book,Game Thrones Clash,7.99,true,George R.R. Martin,"A Song of Ice and Fire",1,fantasy
0553573404,book,Game Thrones,7.99,true,George Martin,"A Song of Ice and Fire",1,fantasy
0553573405,book,Game Thrones,7.99,true,George,"A Song of Ice and Fire",1,fantasy

I want to search for a book with having a name saying 'Game Thrones Clash'(with mm=75%) and author George R.R. Martin(with mm=70%.)

Now I want to search book-name in only the 'name' field having its minimum match value as well. Also, the author needs to be searched in author, with different mm values.

field-type : text_general is configured for fields :'name','author' with multivalued as false.

Query shall run over input field 'name'(mm=75%) having the value 'Game Thrones Clash' and author(mm=70%) having the value 'George R.R. Martin'.

There are 3 criteria over which results will be displayed, Only those results shall be displayed which satisfy all the following three criteria:

  1. if there is a minimum of 75% of tokens are fuzzy matches in the 'name' field, then it should result in output.
  2. if there is a minimum of 70% of tokens are fuzzy matches in the 'author' field, then it should result in output.
  3. if field 'inStock' has value 'true'.

Output shall contain the following result.

0553573403 (name - 75% matched as well author 70% matched)
0553573404 (name - 75% matched as well author 70% matched)

Following books_id will not contain in output.

0553573405 (name - 75% matched but author not 70% matched)

I understand that Extended DisMax includes query parameters 'mm'(Minimum should match) with fuzzy search functionality, but the following query is giving all 3 results.

curl -G http://$solrIp:8983/solr/testCore2/select --data-urlencode "q=(name:'Game~' OR name:'Thrones~' OR name:'Clash~')" --data-urlencode "defType=edismax" --data-urlencode "mm=75%" --data-urlencode "q=(author:'George~' OR author:'R.R.~' OR author:'Martin~')" --data-urlencode "defType=edismax" --data-urlencode "mm=70%" --data-urlencode "sort=books_id asc"
{
  "responseHeader":{
    "status":0,
    "QTime":3,
    "params":{
      "mm":["75%",
        "70%"],
      "q":["(name:'Game~' OR name:'Thrones~' OR name:'Clash~')",
        "(author:'George~' AND author:'R.R.~' AND author:'Martin~')"],
      "defType":["edismax",
        "edismax"],
      "sort":"books_id asc"}},
  "response":{"numFound":3,"start":0,"numFoundExact":true,"docs":[
      {
        "books_id":[553573403],
        "cat":["book"],
        "name":"Game Thrones Clash",
        "price":[7.99],
        "inStock":[true],
        "author":"George R.R. Martin",
        "series_t":"A Song of Ice and Fire",
        "sequence_i":1,
        "genre_s":"fantasy",
        "id":"3de00ecb-fbaf-479b-bfde-6af7dd63c60f",
        "_version_":1738326424041816064},
      {
        "books_id":[553573404],
        "cat":["book"],
        "name":"Game Thrones",
        "price":[7.99],
        "inStock":[true],
        "author":"George Martin",
        "series_t":"A Song of Ice and Fire",
        "sequence_i":1,
        "genre_s":"fantasy",
        "id":"a036a400-4f54-4c90-a52e-888349ecb1da",
        "_version_":1738326424107876352},
      {
        "books_id":[553573405],
        "cat":["book"],
        "name":"Game Thrones",
        "price":[7.99],
        "inStock":[true],
        "author":"George",
        "series_t":"A Song of Ice and Fire",
        "sequence_i":1,
        "genre_s":"fantasy",
        "id":"36360825-1164-4cb6-bf48-ebeaaff0ef10",
        "_version_":1738326424111022080}]
  }}

Can someone help me in writing edismax query or any other way around?


Solution

  • Nested Queries can specify different query parameters for different parts of the query.

    +_query_:"{!edismax mm=75% df=name} Game~ Thrones~ Clash~"
    +_query_:"{!edismax mm=70% df=author} George~ R.R.~ Martin~"
    +inStock:true
    

    Testing

    Steps to reproduce test with a local Solr 9, in Cloud Mode for the Schema Designer page:

    Request: one query string

    bash: (\ before newline continues line, omit inside '...')

    curl --silent --get localhost:8983/api/schema-designer/query \
      --data-urlencode "_=1658489222229" \
      --data-urlencode "configSet=eDisMaxTest1" \
      --data-urlencode 'q= +_query_:"{!edismax mm=75% df=name} Game~ Thrones~ Clash~" 
                           +_query_:"{!edismax mm=70% df=author} George~ R.R.~ Martin~"
                           +inStock:true' \
      --data-urlencode "sort=books_id asc"
    

    cmd: (^ before newline continues line, even inside '...')

    curl --silent --get localhost:8983/api/schema-designer/query ^
      --data-urlencode "_=1658489222229" ^
      --data-urlencode "configSet=eDisMaxTest1" ^
      --data-urlencode 'q= +_query_:"{!edismax mm=75% df=name} Game~ Thrones~ Clash~" ^
                           +_query_:"{!edismax mm=70% df=author} George~ R.R.~ Martin~" ^
                           +inStock:true' ^
      --data-urlencode "sort=books_id asc"
    

    Alternative request: use nested query parameter v=$param refer to other request parameters containing subquery terms.

    bash: (\ before newline continues line, omit inside '...')

    curl --silent --get localhost:8983/api/schema-designer/query \
      --data-urlencode "_=1658489222229" \
      --data-urlencode "configSet=eDisMaxTest1" \
      --data-urlencode 'q= +_query_:"{!edismax mm=75% df=name v=$qname}"
                           +_query_:"{!edismax mm=70% df=author v=$qauthor}"
                           +inStock:true' \
      --data-urlencode "qname= Game~ Thrones~ Clash~" \
      --data-urlencode "qauthor= George~ R.R.~ Martin~" \
      --data-urlencode "sort=books_id asc"
    

    cmd: (^ before newline continues line, even inside '...')

    curl --silent --get localhost:8983/api/schema-designer/query ^
      --data-urlencode "_=1658489222229" ^
      --data-urlencode "configSet=eDisMaxTest1" ^
      --data-urlencode 'q= +_query_:"{!edismax mm=75% df=name v=$qname}" ^
                           +_query_:"{!edismax mm=70% df=author v=$qauthor}" ^
                           +inStock:true' ^
      --data-urlencode "qname= Game~ Thrones~ Clash~" ^
      --data-urlencode "qauthor= George~ R.R.~ Martin~" ^
      --data-urlencode "sort=books_id asc"
    

    Response: two books as desired

    {
      ...
      "responseHeader":{
        ...
        "params":{
          "q":" +_query_:\"{!edismax mm=75% df=name v=$qname}\"\n
                +_query_:\"{!edismax mm=70% df=author v=$qauthor}\"\n
                +inStock:true",
          "qauthor":" George~ R.R.~ Martin~",
          "qname":" Game~ Thrones~ Clash~",
          "sort":"books_id asc",
          "configSet":"eDisMaxTest1",
          "wt":"javabin",
          "version":"2",
          "_":"1658489222229"}},
      "response":{"numFound":2,"start":0,"numFoundExact":true,"docs":[
          {
            "name":"Game Thrones Clash",
            "author":"George R.R. Martin",
            ...
            "books_id":553573403,
            "inStock":true},
          {
            "name":"Game Thrones",
            "author":"George Martin",
            ...
            "books_id":553573404,
            "inStock":true}]
      }}
    

    References

    [1] Nested Query Parser (Solr Reference Guide / Query Guide / Query Syntax and Parsers / Other Query Parsers) https://solr.apache.org/guide/solr/latest/query-guide/other-parsers.html#nested-query-parser

    [2] Nested Queries in Solr https://lucidworks.com/post/nested-queries-in-solr/

    [3] Local Params (Solr Reference Guide / Query Guide / Query Syntax and Parsers) https://solr.apache.org/guide/solr/latest/query-guide/local-params.html

    [4] Query type short form (Solr Reference Guide / Query Guide / Query Syntax and Parsers / Local Params) https://solr.apache.org/guide/solr/latest/query-guide/local-params.html#query-type-short-form

    [5] Extended DisMax (eDisMax) Query Parser (Solr Reference Guide / Query Guide / Query Syntax and Parsers) https://solr.apache.org/guide/solr/latest/query-guide/edismax-query-parser.html

    [6] mm (Minimum [Should-term] Match) Parameter (Solr Reference Guide / Query Guide / Query Syntax and Parsers / DisMax Query Parser) https://solr.apache.org/guide/solr/latest/query-guide/dismax-query-parser.html#mm-minimum-should-match-parameter

    [7] df [default field] (Solr Reference Guide / Query Guide / Query Syntax and Parsers / Standard Query Parser / Standard Query Parser Parameters) https://solr.apache.org/guide/solr/latest/query-guide/standard-query-parser.html#standard-query-parser-parameters