elasticsearchrangeipv4opensearch

Elastic/OpenSearch: query composite IP range like 123.[16-31].0.*


I want to run a query that filters out an IP range like 123.[16-31].0.* (16 and 31 are included). Generic example:

GET _search
{
  "query": {
    "bool": {
      "must": {
        "match_phrase": {
          "somefield": "somevalue"
        }
      },
      "must_not": {
        ... ip filter ...
      }
    }
  }
}

How do I write the must_not part?

A solution that seems to work is to put 16 ranges in the must_not:

      "must_not": [
        {
          "range": {
            "host.ip": {
              "gte": "123.16.0.0",
              "lte": "123.16.0.255"
            }
          }
        },
.... same for 17 until 30
        {
          "range": {
            "host.ip": {
              "gte": "123.31.0.0",
              "lte": "123.31.0.255"
            }
          }
        }
      ]

but takes so much to type down. I would love to use a regex like:

      "must_not": {
        "regexp": {
          "host.ip": "123.(1[6-9]|2[0-9]|30|31).0.*"
        }
      }

but obviously it fails with Can only use regexp queries on keyword and text fields - not on [host.ip] which is of type [ip].


Solution

  • Generating 16 ranges is the fastest solution. But if you don't run this request too many times and prefer to trade some CPU time for programmers time you can convert ip field into a keyword field using runtime mapping. They way, you will be able to treat this field like any other text field. So, you can do something like this:

    DELETE test
    PUT test
    {
      "mappings": {
        "properties": {
          "host": {
            "properties": {
              "ip": {
                "type": "ip"
              }
            }
          }
        }
      }
    }
    
    POST test/_bulk?refresh
    {"index":{}}
    {"host":{"ip":"123.16.0.1"}}
    {"index":{}}
    {"host":{"ip":"123.16.1.0"}}
    {"index":{}}
    {"host":{"ip":"124.16.0.1"}}
    
    POST test/_search
    {
      "runtime_mappings": {
        "host.ip_string": {
          "type": "keyword",
          "script": {
            "source": "emit(doc['host.ip'].value);"
          }
        }
      },
      "query": {
        "bool": {
          "must_not": [
            {
              "regexp": {
                "host.ip_string": "123.(1[6-9]|2[0-9]|30|31).0.*"
              }
            }
          ]
        }
      }
    }
    

    or even better something like this:

    POST test/_search
    {
      "runtime_mappings": {
        "host.ip_string": {
          "type": "keyword",
          "script": {
            "source": "emit(doc['host.ip'].value);"
          }
        }
      },
      "query": {
        "bool": {
          "must_not": [
            {
              "regexp": {
                "host.ip_string": "123\\.<16-31>\\.0\\..*"
              }
            }
          ]
        }
      }
    }