elasticsearch

How to get latest values for each group with an Elasticsearch query?


I have some documents indexed on Elasticsearch, looking like these samples:

{'country': 'France', 'collected': '2015-03-12', 'value': 20}
{'country': 'Canada', 'collected': '2015-03-12', 'value': 21}
{'country': 'Brazil', 'collected': '2015-03-12', 'value': 33}
{'country': 'France', 'collected': '2015-02-01', 'value': 10}
{'country': 'Canada', 'collected': '2015-02-01', 'value': 11}
{'country': 'Mexico', 'collected': '2015-02-01', 'value': 9}
...

I want to build a query that gets one result per country, getting only the ones with max(collected).

So, for the examples shown above, the results would be something like:

{'country': 'France', 'collected': '2015-03-12', 'value': 20}
{'country': 'Canada', 'collected': '2015-03-12', 'value': 21}
{'country': 'Brazil', 'collected': '2015-03-12', 'value': 33}
{'country': 'Mexico', 'collected': '2015-02-01', 'value': 9}

I realized I need to do aggregation on country, but I'm failing to understand how to limit the results on max(collected).

Any ideas?


Solution

  • You can use a top_hits aggregation that groups on the country field, returns 1 doc per group, and orders the docs by the collected date descending:

    POST /test/_search?search_type=count
    {
        "aggs": {
            "group": {
                "terms": {
                    "field": "country"
                },
                "aggs": {
                    "group_docs": {
                        "top_hits": {
                            "size": 1,
                            "sort": [
                                {
                                    "collected": {
                                        "order": "desc"
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        }
    }