elasticsearchfull-text-searchembeddinghybrid

full-text and knn_vector hybrid search for elastic


I am currently working on a search engine and i've started to implement semantic search. I use open distro version of elastic and my mapping look like this for the moment :

{
  "settings": {
    "index": {
      "knn": true,
      "knn.space_type": "cosinesimil"
    }
  },
  "mappings": {
    "properties": {
      "title": { 
        "type" : "text"
      },
      "data": { 
        "type" : "text"
      },
      "title_embeddings": {
        "type": "knn_vector", 
        "dimension": 600
      },
      "data_embeddings": {
        "type": "knn_vector", 
        "dimension": 600
      }
    }
  }
}

for basic knn_vector search i use this :

{
  "size": size,
  "query": {
    "script_score": {
      "query": {
        "match_all": { }
      },
      "script": {
        "source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
        "params": {
          "field1": "title_embeddings",
          "field2": "data_embeddings",
          "query_value": query_vec
        }
      }
    }
  }
}

and i've managed to get a, kind of, hybrid search with this :

{
  "size": size,
  "query": {
    "function_score": {
      "query": {
        "multi_match": { 
          "query": query,
          "fields": ["data", "title"]
        }
      },
      "script_score": {
        "script": {
          "source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
          "params": {
            "field1": "title_embeddings",
            "field2": "data_embeddings",
            "query_value": query_vec
          }
        }
      }
    }
  }
}

The problem is that if i don't have the word in the document, then it is not returned. For example, with the first search query, when i search for trump (which is not in my dataset) i manage to get document about social network and politic. I don't have these results with the hybrid search.

I have tried this :

 {
  "size": size,
  "query": {
    "function_score": {
      "query": {
        "match_all": { }
      },
      "functions": [
      {
        "filter" : {
          "multi_match": { 
            "query": query,
            "fields": ["data", "title"]
          }
        },
        "weight": 1
      },
      {
        "script_score" : {
          "script" : {
            "source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
            "params": {
              "field1": "title_embeddings",
              "field2": "data_embeddings",
              "query_value": query_vec
            }
          }
        },
        "weight": 4
      }
      ],
      "score_mode": "sum",
      "boost_mode": "sum"
    }
  }
}

but the multi match part give a constant score to all documents that match and i want to use the filter to rank my document like in normal full text query. Any idea to do it ? Or should i use another strategy? Thank you in advance.


Solution

  • After the help of Archit Saxena here is the solution of my problems :

    {
      "size": size,
      "query": {
        "function_score": {
          "query": {
            "bool": { 
              "should" : [
                {
                  "multi_match" : { 
                    "query": query,
                    "fields": ["data", "title"]
                  }
                },
                {
                  "match_all": { }
                }
              ],
              "minimum_should_match" : 0
            }
          },
          "functions": [
          {
            "script_score" : {
              "script" : {
                "source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
                "params": {
                  "field1": "title_embeddings",
                  "field2": "data_embeddings",
                  "query_value": query_vec
                }
              }
            },
            "weight": 20
          }
          ],
          "score_mode": "sum",
          "boost_mode": "sum"
        }
      }
    }