[SOLVED] How to execute a structured query containing symbols in AWS Cloudsearch

How to execute a structured query containing symbols in AWS Cloudsearch

I'm trying to execute a structured prefix query in Cloudsearch.

Here's a snippet of the query args (csattribute is of type text)

{
    "query": "(prefix field=csattribute '12-3')",
    "queryParser": "structured",
    "size": 5
}

My above query will result in No matches for "(prefix field=csattribute '12-3')".

However, if I change my query to

{
    "query": "(prefix field=csattribute '12')",
    "queryParser": "structured",
    "size": 5
}

Then I will get a list of results I expect.

I haven't found much in my brief googling. How do I include the - in the query? Does it need to be escaped? Are there other characters that need to be escaped?

Solution

I got pointed to the right direction via this SO question: How To search special symbols AWS Search

Below is a snippet from https://docs.aws.amazon.com/cloudsearch/latest/developerguide/text-processing.html

Text Processing in Amazon CloudSearch ... During tokenization, the stream of text in a field is split into separate tokens on detectable boundaries using the word break rules defined in the Unicode Text Segmentation algorithm.

According to the word break rules, strings separated by whitespace such as spaces and tabs are treated as separate tokens. In many cases, punctuation is dropped and treated as whitespace. For example, strings are split at hyphens (-) and the at symbol (@). However, periods that are not followed by whitespace are considered part of the token.

From what I understand, text and text-array fields are tokenized based upon the analysis scheme (in my case it's english). The text was tokenized, and the - symbol is a word break token.

This field doesn't need to be tokenized. Updating the index type to literal prevents all tokenization on the field, which allows the query in my question to return the expected results.