I am currently working on a search engine and i've started to implement semantic search. I use open distro version of elastic and my mapping look like this for the moment :
{
"settings": {
"index": {
"knn": true,
"knn.space_type": "cosinesimil"
}
},
"mappings": {
"properties": {
"title": {
"type" : "text"
},
"data": {
"type" : "text"
},
"title_embeddings": {
"type": "knn_vector",
"dimension": 600
},
"data_embeddings": {
"type": "knn_vector",
"dimension": 600
}
}
}
}
for basic knn_vector search i use this :
{
"size": size,
"query": {
"script_score": {
"query": {
"match_all": { }
},
"script": {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
}
}
}
and i've managed to get a, kind of, hybrid search with this :
{
"size": size,
"query": {
"function_score": {
"query": {
"multi_match": {
"query": query,
"fields": ["data", "title"]
}
},
"script_score": {
"script": {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
}
}
}
}
The problem is that if i don't have the word in the document, then it is not returned. For example, with the first search query, when i search for trump (which is not in my dataset) i manage to get document about social network and politic. I don't have these results with the hybrid search.
I have tried this :
{
"size": size,
"query": {
"function_score": {
"query": {
"match_all": { }
},
"functions": [
{
"filter" : {
"multi_match": {
"query": query,
"fields": ["data", "title"]
}
},
"weight": 1
},
{
"script_score" : {
"script" : {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
},
"weight": 4
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
}
}
but the multi match part give a constant score to all documents that match and i want to use the filter to rank my document like in normal full text query. Any idea to do it ? Or should i use another strategy? Thank you in advance.
After the help of Archit Saxena here is the solution of my problems :
{
"size": size,
"query": {
"function_score": {
"query": {
"bool": {
"should" : [
{
"multi_match" : {
"query": query,
"fields": ["data", "title"]
}
},
{
"match_all": { }
}
],
"minimum_should_match" : 0
}
},
"functions": [
{
"script_score" : {
"script" : {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
},
"weight": 20
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
}
}