I have been using cardinality to find some unique fields, such as author
"aggs": {
"author_count" : {
"cardinality" : {
"field" : "author"
}
}
}
This works and counts all the author fields that have a unique author in it.
Now I want to find the total size of these unique authors. With other queries I have just done this by just adding
"aggs":{
"sum":{
"field" : "length" }}}
But when I have tried this it give me the total length of everything not just for the unique authors.
So for example if the field author contains only one "Kim" this should be returned. I want every author who has written only single book and add all of their page lengths together too.
e.g
"author" : "kim",
"length": 100
"author" : "lolo",
"length": 100
The output should be author_count 2
and total_length 200
.
But for
"author" : "kim",
"length": 100
"author" : "lolo",
"length": 100
"author" : "lolo",
"length": 100
The output should be author_count 1
and total_length 100
. Because kim is the only unique author(author who has written only one book)
Any ideas?
After understanding the question, this can be achieved with bucket selector aggregation and sum bucket aggregation. First terms aggregation on author field will give all the unique authors, then value count aggregation will give books these unique authors have written. total_sum sums the length of pages.
Now bucket selector will only retain buckets of those authors which have written only single book and finally sum_bucket sums all the length of those authors
{
"size": 0,
"aggs": {
"unique_author": {
"terms": {
"field": "author",
"size": 100
},
"aggs": {
"total_book_count": {
"value_count": {
"field": "author"
}
},
"total_sum": {
"sum": {
"field": "length"
}
},
"only_single_book_author": {
"bucket_selector": {
"buckets_path": {
"total_books": "total_book_count"
},
"script": "total_books==1"
}
}
}
},
"page_length": {
"sum_bucket": {
"buckets_path": "unique_author>total_sum"
}
}
}
}