I'm trying to execute a structured prefix query in Cloudsearch.
Here's a snippet of the query args (csattribute
is of type text)
{
"query": "(prefix field=csattribute '12-3')",
"queryParser": "structured",
"size": 5
}
My above query will result in No matches for "(prefix field=csattribute '12-3')"
.
However, if I change my query to
{
"query": "(prefix field=csattribute '12')",
"queryParser": "structured",
"size": 5
}
Then I will get a list of results I expect.
I haven't found much in my brief googling. How do I include the -
in the query? Does it need to be escaped? Are there other characters that need to be escaped?
I got pointed to the right direction via this SO question: How To search special symbols AWS Search
Below is a snippet from https://docs.aws.amazon.com/cloudsearch/latest/developerguide/text-processing.html
Text Processing in Amazon CloudSearch ... During tokenization, the stream of text in a field is split into separate tokens on detectable boundaries using the word break rules defined in the Unicode Text Segmentation algorithm.
According to the word break rules, strings separated by whitespace such as spaces and tabs are treated as separate tokens. In many cases, punctuation is dropped and treated as whitespace. For example, strings are split at hyphens (-) and the at symbol (@). However, periods that are not followed by whitespace are considered part of the token.
From what I understand, text
and text-array
fields are tokenized based upon the analysis scheme (in my case it's english). The text was tokenized, and the -
symbol is a word break token.
This field doesn't need to be tokenized. Updating the index type to literal
prevents all tokenization on the field, which allows the query in my question to return the expected results.