I have the following mapping in my ElasticSearch index (simplified as the other fields are irrelevant:
{
"test": {
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"entities": {
"type": "nested",
"properties": {
"text_property": {
"type": "text"
},
"float_property": {
"type": "float"
}
}
}
}
}
}
}
The data looks like this (again simplified):
[
{
"name": "a",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.4
},
{
"text_property": "baz",
"float_property": 0.6
}
]
},
{
"name": "b",
"entities": [
{
"text_property": "foo",
"float_property": 0.9
}
]
},
{
"name": "c",
"entities": [
{
"text_property": "foo",
"float_property": 0.2
},
{
"text_property": "bar",
"float_property": 0.9
}
]
}
]
I'm trying perform a bucket aggregation on the maximum value of float_property
for each document. So for the example above, the following would be the desired response:
...
{
"buckets": [
{
"key": "0.9",
"doc_count": 2
},
{
"key": "0.6",
"doc_count": 1
}
]
}
as doc a
's highest nested value for float_property
is 0.6, b
's is 0.9 and c
's is 0.9.
I've tried using a mixture of nested
and aggs
, along with runtime_mappings
, but I'm not sure in which order to use these, or if this is even possible.
I've managed to figure this out in the end.
The two things I hadn't realised were:
script
instead of a field
key to bucket aggregations.nested
queries, you can access nested values directly using params._source
.The combination of these two things allowed me to write the correct query:
{
"size": 0,
"aggs": {
"max.float_property": {
"terms": {
"script": "double max = 0; for (item in params._source.entities) { if (item.float_property > max) { max = item.float_property; }} return max;"
}
}
}
}
Response:
{
...
"aggregations": {
"max.float_property": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "0.9",
"doc_count": 2
},
{
"key": "0.6",
"doc_count": 1
}
]
}
}
}
I'm confused though, because I thought the correct way to access nested
fields was by using the nested
query type. Unfortunately there's very little documentation for this, so I'm still unsure if this is the intended/correct way to aggregate on scripted nested fields.