I have following documents:
{ "id": 1, "city": "New York" },
{ "id": 2, "city": "New Orleans" },
{ "id": 3, "city": "Old York" }
When I search for "Pizza in New York", I wish to get only document with id:1
.
How to create such a query?
You could use the shingle
tokenfilter
Mapping
PUT /cities
{
"settings": {
"analysis": {
"analyzer": {
"city_analyzer": {
"tokenizer": "keyword",
"filter": ["lowercase"]
},
"shingle_analyzer": {
"lowercase": true,
"tokenizer": "standard",
"filter": ["lowercase", "stop", "shingle_1_3"]
}
},
"filter": {
"shingle_1_3": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3,
"output_unigrams": true
}
}
}
},
"mappings": {
"properties": {
"city": {
"type": "text",
"analyzer": "city_analyzer"
}
}
}
}
The city
field has city_analyzer
. city_analyzer
converts text into a single token
The search analyzer is shingle_analyzer
. The analyzer splits text into 2-item and 3-item shingles to extract the city name as a single token
Your and my documents
POST /cities/_bulk
{"create":{}}
{"city":"New York"}
{"create":{}}
{"city":"New Orleans"}
{"create":{}}
{"city":"Old York"}
{"create":{}}
{"city":"York"}
Search for "New York"
GET /cities/_search?filter_path=hits.hits
{
"query": {
"match": {
"city": {
"query": "Pizza in New York",
"analyzer": "shingle_analyzer"
}
}
}
}
Response
{
"hits" : {
"hits" : [
{
"_index" : "cities",
"_type" : "_doc",
"_id" : "myQonpUBApSWQ18JuXcs",
"_score" : 1.2039728,
"_source" : {
"city" : "New York"
}
},
{
"_index" : "cities",
"_type" : "_doc",
"_id" : "niQonpUBApSWQ18JuXcs",
"_score" : 1.2039728,
"_source" : {
"city" : "York"
}
}
]
}
}
Yes, York is in hits, because there are single-word city names as well
Search for "York" and receive a response
{
"hits" : {
"hits" : [
{
"_index" : "cities",
"_type" : "_doc",
"_id" : "niQonpUBApSWQ18JuXcs",
"_score" : 1.2039728,
"_source" : {
"city" : "York"
}
}
]
}
}