I think conceptually what I'm trying to do is very logical and straightforward. But I haven't been able to figure out a way to do this.
Using arangodb 3.12.4
If I have a document like:
product: apricot yoghurt
category: food
type: sugarfree
And I search for "sugarfree yoghurt"
I want to match the document above, but also match documents such as:
product: cherry yoghurt
category: food
type: fatfree
product: yoghurt starter
category: condiment
type: powder
but the one above should be ranked highest because it has two terms match, across multiple fields.
I'm finding it fascinating that I still haven't been able to find any docs or any answered questions on this kind of use-case. And I'm starting to dread the fact that this may just not be supported.
One option is to have an extra field with a concatenation of all the fields I want to search. But then what if I want to boost the scores for certain fields?
If I understand correctly, you want to match every document with at least one matching token (either product
, category
, or type
has to contain at least one of sugarfree
, yoghurt
, food
). This can be expressed by searching for the tokens either with doc.field IN [ token1, token2, ... ]
(comparison operator) or [ token1, token2, ... ] ANY == doc.field
(array comparison operator) in each of the fields and combining these sub-expressions with logical OR.
Relevant docs: Search operators, Searching Full-text with ArangoSearch
More matching tokens in any of the document fields should result in a higher ranking. This is how the ranking functions work anyway. To adjust the relevance of certain fields, you can use the BOOST()
function. Also see Query Time Relevance Tuning.
LET a = "text_en"
LET t = TOKENS("sugarfree yoghurt food", a)
FOR doc IN v
SEARCH ANALYZER(doc.product IN t OR BOOST(doc.category IN t, 2) OR doc.type IN t, a)
// or: SEARCH ANALYZER(t ANY == doc.product OR BOOST(t ANY == doc.category, 2) OR t ANY == doc.type, a)
// or: SEARCH ANALYZER(MIN_MATCH(doc.product IN t, BOOST(doc.category IN t, 2), doc.type IN t, 1), a)
LET score = BM25(doc)
SORT score DESC
RETURN MERGE(doc, {score})
(System attributes omitted)
category | product | type | score |
---|---|---|---|
food | apricot yoghurt | sugarfree | 3.0241031646728516 |
condiment | sugarfree yoghurt food | powder | 2.882746934890747 |
food | cherry yoghurt | fatfree | 1.6378090381622314 |
food | sugar beet | vegetable | 1.0779929161071777 |
AQL query for the dataset:
LET products = [
{ product: "apricot yoghurt", category: "food", type: "sugarfree" },
{ product: "cherry yoghurt", category: "food", type: "fatfree"},
{ product: "sugarfree yoghurt food", category: "condiment", type: "powder"},
{ product: "sugar beet", category: "food", type: "vegetable" },
{ product: "bluray player", category: "electronics", type: "device" },
]
FOR p IN products INSERT p INTO @@coll