I'm executing a query in elasticsearch. I need to have the number of hits for my attribute "end_date_ut" (type is Date and format is dateOptionalTime) for each month represented in the index. For that, I'm using a date_histogram aggregation.
My query just bellow:
GET inc/_search
{
"size": 0,
"aggs": {
"appli": {
"date_histogram": {
"field": "end_date_ut",
"interval": "month"
}
}
}
}
And here is a part of the result:
"hits": {
"total": 517478,
"max_score": 0,
"hits": []
},
"aggregations": {
"appli": {
"buckets": [
{
"key_as_string": "2009-08-01T00:00:00.000Z",
"key": 1249084800000,
"doc_count": 0
},
{
"key_as_string": "2009-09-01T00:00:00.000Z",
"key": 1251763200000,
"doc_count": 1
},
{
"key_as_string": "2009-10-01T00:00:00.000Z",
"key": 1254355200000,
"doc_count": 2362
},
{
"key_as_string": "2009-11-01T00:00:00.000Z",
"key": 1257033600000,
"doc_count": 5336
},
{
"key_as_string": "2009-12-01T00:00:00.000Z",
"key": 1259625600000,
"doc_count": 7536
},
{
"key_as_string": "2010-01-01T00:00:00.000Z",
"key": 1262304000000,
"doc_count": 8864
}
The problem is that I have too many buckets (results). When I'm using "terms aggregation", I don't have any problems because I can set a size, but with "date_histogram aggregation" I can't find a way to put a limit on my query result.
I suggest to use min_doc_count
to only include buckets that have data, i.e. the buckets with 0 documents would not come back in the response.
GET inc/_search
{
"size": 0,
"aggs": {
"appli": {
"date_histogram": {
"field": "end_date_ut",
"interval": "month",
"min_doc_count": 1 <--- add this
}
}
}
}
If you can, you can also add a range
query in order to restrain the time interval on which the aggregation is run.