I'm currently aggregating a collection by a multi-level nested field and calculating some sub-aggregation metrics from this collection and thats working using elasticsearch's reverse nested feature as described at Sub-aggregate a multi-level nested composite aggregation.
My current struggle is to find a way to sort the aggregations by one of the calculated metrics. For example, considering the following document and my current search call I would like to sort all the aggregations by their clicks sums.
I've tried using bucket_sort
inside the inner aggs
at the back_to_parent
level but got the following java exception.
class org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested cannot be cast to class org.elasticsearch.search.aggregations.InternalMultiBucketAggregation (org.elasticsearch.search.aggregations.bucket.nested.InternalReverseNested and org.elasticsearch.search.aggregations.InternalMultiBucketAggregation are in unnamed module of loader 'app')
{
id: '32ead132eq13w21',
statistics: {
clicks: 123,
views: 456
},
categories: [{ //nested type
name: 'color',
tags: [{ //nested type
slug: 'blue'
},{
slug: 'red'
}]
}]
}
GET /acounts-123321/_search
{
size: 0,
aggs: {
categories_parent: {
nested: {
path: 'categories.tags'
},
aggs: {
filtered: {
filter: {
term: { 'categories.tags.category': 'color' }
},
aggs: {
by_slug: {
terms: {
field: 'categories.tags.slug',
size: perPage
},
aggs: {
back_to_parent: {
reverse_nested: {},
aggs: {
clicks: {
sum: {
field: 'statistics.clicks'
}
},
custom_metric: {
scripted_metric: {
init_script: 'state.accounts = []',
map_script: 'state.accounts.add(new HashMap(params["_source"]))',
combine_script: 'double result = 0;
for (acc in state.accounts) {
result += ( acc.statistics.clicks + acc.statistics.impressions);
}
return result;',
reduce_script: 'double sum = 0;
for (state in states) {
sum += state;
}
return sum;'
}
},
by_tag_sort: {
bucket_sort: {
sort: [{ 'clicks.value': { order: 'desc' } }]
}
}
}
}
}
}
}
}
}
It would also be nice to understand how it would be possible to sort the buckets by a custom metric calculated through a painless scripted_metric
. I have updated the search call above adding a sample custom_metric
that I wish to allow sorting through it.
I see that using bucket_sort
directly does not work with the standard sort array we use for concrete fields. So the following does not seem to sort things. It also won't work for a sort script as well since [bucket_sort] only supports field based sorting
.
by_tag_sort: {
bucket_sort: {
sort: [{ 'custom_metric.value': { order: 'desc' } }]
}
}
bucket_sort
expects to be run within a multi-bucket context but your reverse_nested
aggregation is single-bucket (irrespective of the fact that it's a child of a multi-bucket terms
aggregation).
The trick is to use an empty-ish filters
aggregation to generate a multi-bucket context and then run the bucket sort:
{
"size": 0,
"aggs": {
"categories_parent": {
"nested": {
"path": "categories.tags"
},
"aggs": {
"filtered": {
"filter": {
"term": {
"categories.tags.category": "color"
}
},
"aggs": {
"by_slug": {
"terms": {
"field": "categories.tags.slug",
"size": 10
},
"aggs": {
"back_to_parent": {
"reverse_nested": {},
"aggs": {
"multi_bucket_emulator": {
"filters": {
"filters": {
"placeholder_match_all_query": {
"match_all": {}
}
}
},
"aggs": {
"clicks": {
"sum": {
"field": "statistics.clicks"
}
},
"by_tag_sort": {
"bucket_sort": {
"sort": [
{
"clicks.value": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Update: sorting by the result of a custom scripted metric value
{
"size": 0,
"aggs": {
"categories_parent": {
"nested": {
"path": "categories.tags"
},
"aggs": {
"filtered": {
"filter": {
"term": {
"categories.tags.category": "color"
}
},
"aggs": {
"by_slug": {
"terms": {
"field": "categories.tags.slug",
"size": 10
},
"aggs": {
"back_to_parent": {
"reverse_nested": {},
"aggs": {
"multi_bucket_emulator": {
"filters": {
"filters": {
"placeholder_match_all_query": {
"match_all": {}
}
}
},
"aggs": {
"clicks": {
"sum": {
"field": "statistics.clicks"
}
},
"custom_metric": {
"scripted_metric": {
"init_script": "state.accounts = []",
"map_script": """state.accounts.add(params["_source"])""",
"combine_script": """
double result = 0;
for (def acc : state.accounts) {
result += ( acc.statistics.clicks + acc.statistics.impressions);
}
return result;
""",
"reduce_script": """
double sum = 0;
for (def state : states) {
sum += state;
}
return sum;
"""
}
},
"by_tag_sort": {
"bucket_sort": {
"sort": [
{
"custom_metric.value": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}
}
}
}
}
}
}