I'm wondering how to calculate (and build a bar-graph dashboard on top of it) average of list elements across documents/records that I have in ElasticSearch. Let me try to explain with a simple version:
Say I have three documents in ES, with each document having two array fields ('runners' - an array of strings, and 'runners_times' - an array of numbers, where elements in runners
and runners_times
are sorted so that the first element from the first list corresponds to the first element in the second list, so from document 1: person_a = 100, person_b = 120). Say my three documents/records in ES look like this:
Now, what I want is a bar-graph that gives a list of all unique runners across all three documents (so, in this case, 'person_a', 'person_b', and 'person_c') with their corresponding average times. So, in my case, that would be:
person_a: 95 person_b: 110 person_c: 120
Any tip would be great. Thanks a lot :-)
I'm able to get a list of all unique value in runners
, but I'm not sure how to get an average of that person's times, since they are in a separate list.
Should I perhaps try with dictionaries? {'person_a': 100, 'person_b': 120} maybe? I tried that, too, but dictionaries get saved as a list of unfolded fields instead.
You should re-organize your data. Runner and its time must be a nested field with the following mapping
PUT /runners_reindexed
{
"mappings": {
"properties": {
"runner_data": {
"type": "nested",
"properties": {
"runner": {
"type": "keyword"
},
"time": {
"type": "integer"
}
}
}
}
}
}
Put your documents
POST /runners/_bulk
{"create":{}}
{"runners": ["person_a", "person_b"], "runners_times": [100, 120]}
{"create":{}}
{"runners": ["person_a", "person_c"], "runners_times": [90, 110]}
{"create":{}}
{"runners": ["person_b", "person_c"], "runners_times": [100, 130]}
Then reindex the source index into a new index with name runners_reindexed
POST _reindex
{
"source": {
"index": "runners"
},
"dest": {
"index": "runners_reindexed"
},
"script": {
"source": """
List runners = ctx['_source']['runners'];
List runnerTimes = ctx['_source']['runners_times'];
List runnersWithTimes = new LinkedList();
for (int i = 0; i < runners.size(); i++) {
Map runnerData = new HashMap();
runnerData['runner'] = runners[i];
runnerData['time'] = runnerTimes[i];
runnersWithTimes.add(runnerData);
}
ctx._source[params['runner_with_time_field_name']] = runnersWithTimes;
""",
"params": {
"runner_with_time_field_name": "runner_data"
}
}
}
It's time to aggregate
GET /runners_reindexed/_search?filter_path=aggregations.inside_runner_data.by_runner.buckets
{
"aggs": {
"inside_runner_data": {
"nested": {
"path": "runner_data"
},
"aggs": {
"by_runner": {
"terms": {
"field": "runner_data.runner",
"size": 10
},
"aggs": {
"mean": {
"avg": {
"field": "runner_data.time"
}
}
}
}
}
}
}
}
Response
{
"aggregations" : {
"inside_runner_data" : {
"by_runner" : {
"buckets" : [
{
"key" : "person_a",
"doc_count" : 2,
"mean" : {
"value" : 95.0
}
},
{
"key" : "person_b",
"doc_count" : 2,
"mean" : {
"value" : 110.0
}
},
{
"key" : "person_c",
"doc_count" : 2,
"mean" : {
"value" : 120.0
}
}
]
}
}
}
}