I'm trying to get a list of devices that have not communicated in the last 30 minutes. I think my search needs to be:
The first two I can achieve by collapsing the index by the enum of the device Name, then sorting them by time and only returning the latest:
GET /<index>/_search
{
"collapse": {
"field": "Name.enum",
"inner_hits": {
"name": "most_recent",
"size": 1,
"sort": [{
"Time": {
"order": "desc"
}
}]
}
}
}
But now want to query which of these is over 30 mins old:
"query": {
"bool": {
"must_not": {
"range": {
"Time": {
"gt": "now-35m"
}
}
}
}
}
How can I run the query on the result of the sort after the collapse?
I would do it differently, not by using field collapsing, but using a terms
aggregation instead, which gives you more flexibility.
In the aggregation query below, you can see that we're aggregating the devices names, then we're taking the most recent record (using a max
metric aggregation) and finally, we're weeding out all buckets whose most recent record is older than 35 minutes ago by leveraging the bucket_selector
pipeline aggregation:
POST <index>/_search
{
"size": 0,
"aggs": {
"devices": {
"terms": {
"field": "Name.enum"
},
"aggs": {
"most_recent": {
"max": {
"field": "Time"
}
},
"older_than_30_min": {
"bucket_selector": {
"buckets_path": {
"most_recent": "most_recent"
},
"script": "params.most_recent > new Date().getTime() - 35 * 60000"
}
}
}
}
}
}
If you're on recent versions (8.11+), you can also leverage the newly released and very promising ES|QL language which makes that kind of queries much easier:
POST _query
{
"query": """
from index
| stats most_recent = max(Time) by Name
| where most_recent > NOW() - 35 minutes
| keep Name, most_recent
| limit 100
"""
}
The results look something like this
{
"columns": [
{
"name": "Name",
"type": "keyword"
},
{
"name": "most_recent",
"type": "date"
}
],
"values": [
[
"John Doe",
"2024-04-13T04:25:10.000Z"
]
]
}