I need the ability to query an ElasticSearch index to see if there are any documents that already have a specific value for the field shown below:
"name" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "keyword"
}
}
}
I was initially going to do this using a normalizer, but i'm hoping to avoid having to make changes to the index itself. I then found the match_phrase query which does almost exactly what I need. The problem is that it will also return partial matches as long as they start off the same. For example - if I'm searching for the value this is a test
it will return results for the following values:
this is a test 1
this is a test but i'm almost done now
this is a test again
In my situation I can do another check in code once the data is returned to see if it is in fact a case insensitive exact match, but I'm relatively new to ElasticSearch and I'm wondering if there is any way I could structure my original match_phrase
query in such a way that it would not return the examples I posted above?
For anyone that is interested I found a few different ways to do this, the first - do a match_phrase query and then have a script that checks the length:
GET definitions/_search
{
"query": {
"bool":{
"must":{
"match_phrase":{
"name":{
"query":"Test Name"
}
}
},
"filter": [
{
"script": {
"script": {
"source": "doc['name.raw'].value.length() == 9",
"lang": "painless"
}
}
}
]
}
}
}
Then I figured that if I could check the length in the script, maybe I could just do a case insensitive comparison:
GET definitions/_search
{
"query": {
"bool": {
"filter": [
{
"script": {
"script": {
"source": "doc['name.raw'].value.toLowerCase() == 'test name'",
"lang": "painless"
}
}
}
]
}
}
}
So those are options. In my case I was concerned about performance so we just bit the bullet and created a normalizer that allows for case insensitive comparisons, so these weren't even used. But I figured I should throw this here since I wasn't able to find these answers anywhere else.