In my Elasticsearch index I have duplicates docs where some "unique" fields have the same values.
In order to fix them, I have to find them, so I'm using an aggregation query with min_doc_count=2
. The problem is that I manage to run it only with one key and not with a couple of keys. So in this way it works:
GET /my_index/_search
{
"size": 0,
"aggs": {
"receipts": {
"terms": {
"field": "key1",
"min_doc_count": 2,
"size": 1000000
}
}
}
}
I'd like to have **two terms that simultaneously match, but how to insert a double field
key2
?
Any idea?
I tried with multi-terms aggregations, like this (I don't know if the syntax is correct):
GET /my_index/_search
{
"size": 0,
"aggs": {
"receipts": {
"multi_terms": {
"terms": [
{
"field": "key1"
},
{
"field": "key2"
}
],
"min_doc_count": 2,
"size": 1000000
}
}
}
}
but I get this errror:
{
"error" : {
"root_cause" : [
{
"type" : "parsing_exception",
"reason" : "Unknown aggregation type [multi_terms] did you mean [rare_terms]?",
"line" : 5,
"col" : 26
}
],
"type" : "parsing_exception",
"reason" : "Unknown aggregation type [multi_terms] did you mean [rare_terms]?",
"line" : 5,
"col" : 26,
"caused_by" : {
"type" : "named_object_not_found_exception",
"reason" : "[5:26] unknown field [multi_terms]"
}
},
"status" : 400
}
You can use script also to do this :
GET /docs/_search
{
"size": 0,
"aggs": {
"receipts": {
"terms": {
"script": "doc['key1'].value + '_' + doc['key2'].value",
"min_doc_count": 2,
"size": 1000000
}
}
}
}
But you need to know that there can be performance issues here when we compare with terms query.
Here also some sample documents :
POST docs/_doc
{
"key1": 1,
"key2": 2
}
POST docs/_doc
{
"key1": 1,
"key2": 2
}
POST docs/_doc
{
"key1": 2,
"key2": 1
}
and the result of the query above :
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"receipts": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1_2",
"doc_count": 2
}
]
}
}
}