elasticsearch

How to put a size on a date_histogram aggregation


I'm executing a query in elasticsearch. I need to have the number of hits for my attribute "end_date_ut" (type is Date and format is dateOptionalTime) for each month represented in the index. For that, I'm using a date_histogram aggregation.

My query just bellow:

GET inc/_search
{
  "size": 0,
  "aggs": {
    "appli": {
      "date_histogram": {
        "field": "end_date_ut",
        "interval": "month"
      }
    }
  }
}

And here is a part of the result:

"hits": {
    "total": 517478,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "appli": {
      "buckets": [
        {
          "key_as_string": "2009-08-01T00:00:00.000Z",
          "key": 1249084800000,
          "doc_count": 0
        },
        {
          "key_as_string": "2009-09-01T00:00:00.000Z",
          "key": 1251763200000,
          "doc_count": 1
        },
        {
          "key_as_string": "2009-10-01T00:00:00.000Z",
          "key": 1254355200000,
          "doc_count": 2362
        },
        {
          "key_as_string": "2009-11-01T00:00:00.000Z",
          "key": 1257033600000,
          "doc_count": 5336
        },
        {
          "key_as_string": "2009-12-01T00:00:00.000Z",
          "key": 1259625600000,
          "doc_count": 7536
        },
        {
          "key_as_string": "2010-01-01T00:00:00.000Z",
          "key": 1262304000000,
          "doc_count": 8864
        }

The problem is that I have too many buckets (results). When I'm using "terms aggregation", I don't have any problems because I can set a size, but with "date_histogram aggregation" I can't find a way to put a limit on my query result.


Solution

  • I suggest to use min_doc_count to only include buckets that have data, i.e. the buckets with 0 documents would not come back in the response.

    GET inc/_search
    {
      "size": 0,
      "aggs": {
        "appli": {
          "date_histogram": {
            "field": "end_date_ut",
            "interval": "month",
            "min_doc_count": 1          <--- add this
          }
        }
      }
    }
    

    If you can, you can also add a range query in order to restrain the time interval on which the aggregation is run.