elasticsearchaggregationcomposite

Expected [START_OBJECT] under [size], but got a [VALUE_NUMBER] in [composite]


I am getting this error when trying to query ElasticSearch (in case it matters, I am using python to query it):

elasticsearch.BadRequestError: BadRequestError(400, 'parsing_exception', 'Expected [START_OBJECT] under [size], but got a [VALUE_NUMBER] in [composite]')

This is my query body:

{
  "query": {
    "bool": {
      "must": [
        {"range": {"@timestamp": {"gte": "2024-01-01T00:00:00.000Z", "lte": "2024-01-15T23:59:59.999Z"}}}
      ],
      "filter": [
        {"terms": {"hostip": ["xxx.xxx.xxx.xxx", "xxx.xxx.xxx.xxx", "xxx.xxx.xxx.xxx"]}},
        {"terms": {"processname": ["a.exe", "b.exe", "c.exe"]}}
      ],
      "must_not": []
    }
  },
  "aggs": {
    "composite": {
      "size": 1000,
      "sources": [
        {"date_histogram": {"field": "@timestamp",  "calendar_interval": "minute"}},
        {"processname": {"terms": {"field": "processname"}}},
        {"username": {"terms": {"field": "username"}}},
        {"hostip": {"terms": {"field": "hostip"}}},
        {"tagging": {"terms": {"field": "tagging"}}}
      ]
    }
  },
  "size": 0
}

If I moved the "size": 1000 to after the "sources": section, the error message will change to elasticsearch.BadRequestError: BadRequestError(400, 'parsing_exception', 'Expected [START_OBJECT] under [sources], but got a [START_ARRAY] in [composite]')

{
  "query": {
    "bool": {
      "must": [
        {"range": {"@timestamp": {"gte": "2024-01-01T00:00:00.000Z", "lte": "2024-01-15T23:59:59.999Z"}}}
      ],
      "filter": [
        {"terms": {"hostip": ["xxx.xxx.xxx.xxx", "xxx.xxx.xxx.xxx", "xxx.xxx.xxx.xxx"]}},
        {"terms": {"processname": ["a.exe", "b.exe", "c.exe"]}}
      ],
      "must_not": []
    }
  },
  "aggs": {
    "composite": {
      "sources": [
        {"date_histogram": {"field": "@timestamp",  "calendar_interval": "minute"}},
        {"processname": {"terms": {"field": "processname"}}},
        {"username": {"terms": {"field": "username"}}},
        {"hostip": {"terms": {"field": "hostip"}}},
        {"tagging": {"terms": {"field": "tagging"}}}
      ],
      "size": 1000
    }
  },
  "size": 0
}

Solution

  • All aggregations in Elasticsearch have to be named. This query is missing two names: one for the top composite aggregation and another one for the first date_histogram aggregation:

    {
      "query": {
        "bool": {
          "must": [
            {"range": {"@timestamp": {"gte": "2024-01-01T00:00:00.000Z", "lte": "2024-01-15T23:59:59.999Z"}}}
          ],
          "filter": [
            {"terms": {"hostip": ["xxx.xxx.xxx.xxx", "xxx.xxx.xxx.xxx", "xxx.xxx.xxx.xxx"]}},
            {"terms": {"processname": ["a.exe", "b.exe", "c.exe"]}}
          ],
          "must_not": []
        }
      },
      "aggs": {
        "my_buckets": {
          "composite": {
            "sources": [
              {"date": {"date_histogram": {"field": "@timestamp",  "calendar_interval": "minute"}}},
              {"processname": {"terms": {"field": "processname"}}},
              {"username": {"terms": {"field": "username"}}},
              {"hostip": {"terms": {"field": "hostip"}}},
              {"tagging": {"terms": {"field": "tagging"}}}
            ],
            "size": 1000
          }
        }
      },
      "size": 0
    }