I have many queries like:
"iphone 10"
"iphone 10 case"
"iphone 11"
"iphqqq"
I would like to autosuggest the current or next word (prefix-like manner is ok).
For example (input -> [outputs]):
"iph" -> ["iphone", "iphqqq"] ("iphone" is present only once in suggests)
"iphone” -> ["10", "11"]
"iphone 1” -> ["10", "11"]
"iphone 10" -> ["case"]
"10" -> [] (since there are no queries that start with "10")
"case" -> []
I looked towards the completion-suggester, but I didn’t understand whether it was possible to implement the removal of duplicates in suggests and how exactly to implement the output of only the current or next word.
It won’t be a problem for me to do pre-processing of queries or post-processing of outputs if the problem cannot be completely solved using elasticsearch alone.
elasticsearch version: 8.10.2
ElasticSearch is not suitable for this kind of application. ElasticSearch uses vector search to output a similar query according to similarity metric. It transforms a query into an embedding (which is vector in a form of a list of numerical values), which is then compared to other existing queries.
It would be possible to implement it with ElasticSearch if there was a similarity metric like S(q1, q2)
: if q1
is a substring of q2
then q1
is similar to q2
. But this similarity metric doesn't make sense, it's binary not numerical. Similarity metrics used in ElasticSearch include Euclidean (L2 norm) and cosine.
To solve your problem efficiently, you can use tree trie.