elasticsearchphone-numberelasticsearch-query

Search Query about phone number in Elasticsearch


I have a question about Elasticsearch

I made a search query about the phone number. My plan is that even I don't put the hyphen or bracket, result would show the phone number.

For example, phone number is (213)234-1111 and search query is

GET _msearch
{ "query": {"fuzzy": { "tel": {"value": "2132341111", "max_expansions" : 100}}}}

the result is

{
  "took" : 0,
  "responses" : [
    {
      "took" : 0,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 0,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "status" : 200
    }
  ]
}

I need a help that even I put the number without bracket and hyphen, the result show the real phone number with information.


Solution

  • To allow efficient querying, make sure to index the documents accordingly.

    In this example that I just made, I am making sure that phone-numbers are indexed without the hyphens and parenthesis. This allows me to query without using those characters as well.

    Example:

    (1) Create the index:

    PUT my_index
    {
      "settings": {
        "analysis": {
          "analyzer": {
            "default": {
              "tokenizer": "standard",
              "char_filter": [
                "my_char_filter"
              ]
            }
          },
          "char_filter": {
            "my_char_filter": {
              "type": "pattern_replace",
              "pattern": "\\((\\d+)\\)(\\d+)-(\\d+)",
              "replacement": "$1$2$3"
            }
          }
        }
      }
    }
    

    (2) Add a document to the index:

    POST my_index/doc
    {
      "Description": "My phone number is (213)234-1111"
    }
    

    (3) Query with the original phone number:

    GET my_index/_search
    {
      "query": {
        "match": {
          "Description": "(213)234-1111"
        }
      }
    }
    
    (1 result)
    

    (4) Query without special characters:

    GET my_index/_search
    {
      "query": {
        "match": {
          "Description": "2132341111"
        }
      }
    }
    
    (1 result)
    

    So how did that work?

    By using the pattern_replace char filter, we're stripping away everything but the raw numbers, meaning that "(213)234-1111" is actually stored as "2132341111" whenever we match a phone numbes. Since this pattern_replace is also applied at query time, we can now search both with and without the special characters in the phone number and get a match.