elasticsearchsearchwhitespacehyphenation

ElasticSearch searching hyphened text with whitespace instead dash on the query


I have indexed data (person) with firstName = "Jean-Marc", and I would like to be able to find this person using a combination of different queries, for example for the firstName "Jean-Marc" it should be possible to search with: "Jean-Marc" and "Jean Marc" (with whitespace or dash)

Here is the mapping :

  "firstName": {
    "type": "keyword",
    "normalizer": "keyword_normalizer",
    "fields": {
      "analysed": {
        "type": "text",
        "analyzer": "hyphen_analyzer",
        "search_analyzer": "standard",
        "fielddata": true
      }
    }
  }

And the setting :

"char_filter": {
    "allowOnlyChar": {
        "pattern": "[^A-Za-z]",
        "type": "pattern_replace",
        "replacement": " "
    }
}

"analyzer": {
    "hyphen_analyzers": {
        "filter": "lowercase",
        "char_filter": [
            "allowOnlyChar"
        ],
        "type": "custom",
        "tokenizer": "standard"
    }
}

I get the person when I keep the dash, but no result with whitespace query

I use elastic 6.2.4


Solution

  • Define your analyser :

    "char_filter": {
        "allowOnlyChar": {
            "pattern": "[^A-Za-z]",
            "type": "pattern_replace",
            "replacement": " "
        }
    }
    
    "analyzer": {
        "yourAnalyzer": {
            "filter": "lowercase",
            "char_filter": [
                "allowOnlyChar"
            ],
            "type": "custom",
            "tokenizer": "standard"
        }
    }
    

    And of course index your documents with this analyser. "analyzer": "yourAnalyzer"

    link to doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-replace-charfilter.html