[SOLVED] Aggregate objects in ElasticSearch by IP Prefix

Aggregate objects in ElasticSearch by IP Prefix

I have an ElasticSearch index where I store internet traffic flow objects, which each object containing an IP address. I want to aggregate the data in a way that all objects with the same IP Prefix are collected in the same bucket (but without specifying a specific Prefix). Something like a histogram aggregation. Is this possible?

I have tried this:

GET flows/_search
{
  "size": 0,
  "aggs": {
    "ip_ranges": {
      "histogram": {
        "field": "ipAddress",
        "interval": 256
      }
    }
  }
}

But this doesn't work, probably because histogram aggregations aren't supported for ip type fields. How would you go about doing this?

Solution

Firstly, As suggested here, the best approach would be to:

categorize the IP address at index time and then use a simple keyword field to store the class c information, and then use a term aggregation on that field to do the count.

Alternatively, you could simply add a multi-field keyword mapping:

PUT myindex
{
  "mappings": {
    "properties": {
      "ipAddress": {
        "type": "ip",
        "fields": {
          "keyword": {         <---
            "type": "keyword"
          }
        }
      }
    }
  }
}

and then extract the prefix at query time (⚠️ highly inefficient!):

GET myindex/_search
{
  "size": 0,
  "aggs": {
    "my_prefixes": {
      "terms": {
        "script": "/\\./.split(doc['ipAddress.keyword'].value)[0]",
        "size": 10
      }
    }
  }
}

As a final option, you could define the intervals of interest in advance and use an ip_range aggregation:

{
  "size": 0,
  "aggs": {
    "my_ip_ranges": {
      "ip_range": {
        "field": "ipAddress",
        "ranges": [
          { "to": "192.168.1.1" },
          { "from": "192.168.1.1" }
        ]
      }
    }
  }
}