I'm having trouble with ArangoSearch.
Here is some dummy data that I have in a collection called things
(for simplicity I have removed each of their "_id", "_key" and "_rev" properties):
{"text":"eat a cookie"}
{"text":"I like cookies"}
{"text":"Timmy how are u"}
{"text":"I read a book on elves"}
And I have a view that looks like this (I am calling it practice
):
{
"writebufferIdle": 64,
"type": "arangosearch",
"primarySortCompression": "lz4",
"links": {
"things": {
"analyzers": [
"text_en",
"identity"
],
"fields": {
"text": {
"analyzers": [
"text_en"
]
}
},
"includeAllFields": true,
"storeValues": "none",
"trackListPositions": false
}
},
"primarySort": [],
"writebufferSizeMax": 33554432,
"consolidationPolicy": {
"type": "tier",
"segmentsBytesFloor": 2097152,
"segmentsBytesMax": 5368709120,
"segmentsMax": 10,
"segmentsMin": 1,
"minScore": 0
},
"cleanupIntervalStep": 2,
"commitIntervalMsec": 1000,
"storedValues": [],
"id": "138993",
"globallyUniqueId": "h23A40B2F96C2/138993",
"writebufferActive": 0,
"consolidationIntervalMsec": 1000
}
When I do an AQL search like follows, it correctly returns 4:
FOR docs IN practice COLLECT WITH COUNT INTO num RETURN num
But when I do an AQL search like this, I mostly get empty arrays:
FOR doc IN practice
SEARCH ANALYZER(doc.text == "cookie", "text_en")
RETURN doc
(weirdly, there is a word or two that works with the above but a majority don't - for example, "cookie" returns an empty array but "how" returns one match)
Any idea what I am doing wrong?
Thanks
The indexed text
field has text_en
processing applied but you aren't applying it to the search term.
ANALYZER(doc.text == "cookie", "text_en")
The ANALYZER()
function only selects the analyzer for the indexed data here.
Depending on how the analyzer transforms the stored attribute values, there can be a mismatch because of stemming. All of the built-in text analyzers have stemming enabled.
Try RETURN TOKENS("cookie", "text_en")
to see what the analyzer does to the word.
This should find two things:
ANALYZER(doc.text == TOKENS("cookie", "text_en")[0], "text_en")