elasticsearchtokenizequery-analyzer

ElasticSearch search for special characters with pattern analyzer


I'm currently using a custom analyzer with the tokenizer set to be the pattern (\W|_)+ So so each term is only letters and split on any non letter. As an example I have a document with the contents [dbo].[Material_Get] and another with dbo.Another_Material_Get. I want to be able to search for "Material_Get" and have a hit on both documents but if I put a search of "[Material_Get]" it still hits on dbo.Another_Material_Get even though it doesn't have the brackets in it. Also if I search for "Material Get" (in a quoted search) I shouldn't get any hits since neither of them have that phrase in it.

I could settle for an analyzer/tokenizer that would find whenever there is the input string anywhere in the file even if it has other things next to it. For example searching for "aterial_get" would match in both. Is it possible to do either of my cases?


Solution

  • From what you have explained what I got is that you want to do partial matches also like searching for "aterial_get".

    To satisfy all your requirement, you need to change the mapping of your field to have ngram token filter in the analyzer and without removing the special characters. A sample analyzer can look like

    {
      "settings":{
        "analysis":{
          "analyzer":{
            "partialmatch":{
              "type":"custom",
              "tokenizer":"keyword",
              "filter":[ "lowercase", "ngram" ] 
            }
          },
          "filter":{
            "ngram":{
              "type":"ngram",
              "min_gram":2,
              "max_gram":15
            }
          }
        }
      }
    }
    

    And define in your mapping for your_field the analyzer "partialmatch" defined above. You can change the values of min_gram and max_gram as per your needs.

    With this mapping you can do a normal term search like below

    {
        "term": {
            "your_field": "aterial_get"
        }
    }