I have a question about Elasticsearch
I made a search query about the phone number. My plan is that even I don't put the hyphen or bracket, result would show the phone number.
For example, phone number is (213)234-1111 and search query is
GET _msearch
{ "query": {"fuzzy": { "tel": {"value": "2132341111", "max_expansions" : 100}}}}
the result is
{
"took" : 0,
"responses" : [
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"status" : 200
}
]
}
I need a help that even I put the number without bracket and hyphen, the result show the real phone number with information.
To allow efficient querying, make sure to index the documents accordingly.
In this example that I just made, I am making sure that phone-numbers are indexed without the hyphens and parenthesis. This allows me to query without using those characters as well.
Example:
(1) Create the index:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"default": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": "\\((\\d+)\\)(\\d+)-(\\d+)",
"replacement": "$1$2$3"
}
}
}
}
}
(2) Add a document to the index:
POST my_index/doc
{
"Description": "My phone number is (213)234-1111"
}
(3) Query with the original phone number:
GET my_index/_search
{
"query": {
"match": {
"Description": "(213)234-1111"
}
}
}
(1 result)
(4) Query without special characters:
GET my_index/_search
{
"query": {
"match": {
"Description": "2132341111"
}
}
}
(1 result)
So how did that work?
By using the pattern_replace char filter, we're stripping away everything but the raw numbers, meaning that "(213)234-1111" is actually stored as "2132341111" whenever we match a phone numbes. Since this pattern_replace is also applied at query time, we can now search both with and without the special characters in the phone number and get a match.