elasticsearch

Elasticsearch date in range date_histogram extended_bounds


I want to get date_histogram during specific period, how to restrict the date period? how should I use the extended_bounds parameter? For example : I want to query the date_histogram between '2024-07-10' and '2024-07-22', and the fixed_interval could be changed. I query with this expression insuring having the same filter range in the query:

GET /logging-events/_search

{
    "size": 1,
    "query": {
        "bool": {
            "filter": [
                {
                    "range": {
                        "timestamp": {
                            "gt": "2024-07-10T00:00:00",
                            "lt": "2024-07-22T23:59:59"
                        }
                    }
                }
            ]
        }
    },
    "aggregations": {
        "date_ranges": {
            "date_histogram": {
                "field": "timestamp",
                "fixed_interval": "8d",
                "extended_bounds": {
                    "min": "2024-07-10",
                    "max": "2024-07-22"
                },
                "format":"yyyy-MM-dd"
            }
        }
    }
}

But I still get the date_histogram not in the range.

Unexpected Response :

aggregations.date_ranges.buckets[0].key_as_string eq 2024-07-08

{
    "hits": {
        "total": {
            "value": 13,
            "relation": "eq"
        },
        "max_score": 0.0,
        "hits": [
            {
                "_index": "logging-events",
                "_id": "BuP86ZAB2uzWuz8xZdkz",
                "_score": 0.0,
                "_source": {
                    "_class": "com.ecommerce.monitoring.services.event.LoggingEvent",
                    "event_type": "CLICK",
                    "session_id": "b2f4d3a1-4d4e-4f67-9b2a-0e7b0c9b5e2f",
                    "product_id": 22,
                    "shop_id": 2,
                    "timestamp": "2024-07-10T06:15:42.000Z"
                }
            }
        ]
    },
    "aggregations": {
        "date_ranges": {
            "buckets": [
                {
                    "key_as_string": "2024-07-08",
                    "key": 1720396800000,
                    "doc_count": 6
                },
                {
                    "key_as_string": "2024-07-16",
                    "key": 1721088000000,
                    "doc_count": 7
                }
            ]
        }
    }
}

Expected: Starts from 2024-07-10


Solution

  • First of all, I think the following sentence in Elasticsearch documentation doesn't capture what's actually going one and might have confused you.

    With extended_bounds setting, you now can "force" the histogram aggregation to start building buckets on a specific min value and also keep on building buckets up to a max value (even if there are no documents anymore).

    You cannot force to build buckets on a specific value, you can only force building empty buckets before and after your last available document, but the bucket will start on values specified by the fixed_interval regardless of what you specify in extended_bounds. So, if the fixed interval is 8d it will only create buckets every 8 days since Jan 1, 1970.

    2024-07-08 is 19,912th day after Jan 1, 1970 and 19,912 is divisible by 8 so this is why bucket starts on that date and not on 2024-07-10. To move bucket to 2024-07-10 you need to add 2d offset to date histogram. So, in your case the query should look like this:

    {
        "size": 0,
        "query": {
            "bool": {
                "filter": [
                    {
                        "range": {
                            "timestamp": {
                                "gt": "2024-07-10T00:00:00",
                                "lt": "2024-07-22T23:59:59"
                            }
                        }
                    }
                ]
            }
        },
        "aggregations": {
            "date_ranges": {
                "date_histogram": {
                    "field": "timestamp",
                    "fixed_interval": "8d",
                    "format":"yyyy-MM-dd",
                    "offset": "2d"
                }
            }
        }
    }
    

    You might also want to take a look at using hard_bounds instead of query to limit buckets that are getting created.