azure-cognitive-searchazure-search-.net-sdk

Azure search services issue for white space and wildcard search of special characters


We have an application that allows the users to enter anything on the summary field. The users can type in any special characters like #$!@~ etc including white space and they request that they can search based on those special characters as well. For example, one of the entry is "test testing **** #### !!!!! ???? @ $".

I created a cognitive search index with analyzer to be standard.lucene, shown below:

{ "name": "Summary", "type": "Edm.String", "searchable": true, "filterable": true, "retrievable": true, "sortable": true, "facetable": true, "key": false, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": "standard.lucene", "synonymMaps": [] }

When I used the postman query:

{ "top":"1000", "queryType": "full", "searchMode":"all", "search": "testing", "searchFields": "Summary", "count":true }

I can get the expected result.

If I use the following:

{ "top":"1000", "queryType": "full", "searchMode":"all", "search": "testing ****", "searchFields": "Summary", "count":true }

I got the error with "InvalidRequestParameter".

If I changed to the following query:

{ "top":"1000", "queryType": "full", "searchMode":"all", "search": ""****"", "searchFields": "Summary", "count":true }

Then I am not getting any results back.

Per this article: https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax#escaping-special-characters

In order to use any of the search operators as part of the search text, escape the character by prefixing it with a single backslash (). Special characters that require escaping include the following:

I need to prefix with single backslash for the special characters. But in my case it doesn't seem to work. Any help will be appreciated!


Solution

  • I finally get this one resolved by creating a customized analyzer. The index definition:

    {
        "name": "FieldName",
        "type": "Edm.String",
        "searchable": true,
        "filterable": true,
        "retrievable": true,
        "sortable": true,
        "facetable": true,
        "key": false,
        "indexAnalyzer": null,
        "searchAnalyzer": null,
        "analyzer": "specialcharanalyzer",
        "synonymMaps": []
    },
    

    The analyzer is specified below:

    "analyzers": [
        {
            "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
            "name": "specialcharanalyzer",
            "tokenizer": "whitespace",
            "tokenFilters": [
                "lowercase"
            ],
            "charFilters": []
        }
    ],
    

    Then you can use the format this document specified https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax#special-characters

    https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax#special-characters

    Special characters that require escaping include the following:

    + - & | ! ( ) { } [ ] ^ " ~ * ? : \ /
    

    For characters not in the above required escaping character, use the following format for infix search:

    "search": "/.*SearchChar.*/",
    

    For example, if you want to search for $, then use the following format:

    "search": "/.*$.*/",
    

    For special characters in the list, use this format:

    "search" : "/.*\\escapingcharacter.*/",
    

    For example to search for +, use the following query;

    "search" : "/.*\\+.*/",
    

    # is also considered to be escaping character if it is in a statement.

    To search for *, use this format:

    "search":"/\\**/",