elasticsearchaggregationelasticsearch-net

Elasticsearch aggregation on array items


Given below is my Elasticsearch document on which I want to fire an aggregation query.

{
  "id": 1,
  "attributes": [
    {
      "fieldId": 1,
      "value": "Male"
    },
    {
      "fieldId": 2,
      "value": "12/11/2015"
    }
  ]
}
{
  "id": 2,
  "attributes": [
    {
      "fieldId": 1,
      "value": "Male"
    },
    {
      "fieldId": 2,
      "value": "11/11/2015"
    }
  ]
}

The result has to be as follows.

[
  {
    "key": "Male",
    "doc_count": 1
  }
]
[
  {
    "key": "12/11/2015",
    "doc_count": 1
  },
  {
    "key": "11/11/2015",
    "doc_count": 1
  }
]

Is there a way that this can be achieved in Elasticsearch?


Solution

  • That's possible. See this example:

    We have to map attributes as nested type to be able to aggregate properly.

    PUT /test
    {
      "mappings": {
        "sample": {
          "properties": {
            "id": {
              "type": "integer"
            },
            "attributes": {
              "type": "nested",
              "properties": {
                "fieldId": {
                  "type": "integer"
                },
                "value": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            }
          }
        }
      }
    }
    

    Let's add your given test data:

    PUT /test/sample/1
    {"id":1,"attributes":[{"fieldId":1,"value":"Male"},{"fieldId":2,"value":"12/11/2015"}]}
    PUT /test/sample/2
    {"id":2,"attributes":[{"fieldId":1,"value":"Male"},{"fieldId":2,"value":"11/11/2015"}]}
    

    And finally let's run this query:

    GET /test/_search
    {
      "size": 0,
      "query": {
        "match_all": {}
      },
      "aggs": {
        "Nest": {
          "nested": {
            "path": "attributes"
          },
          "aggs": {
            "fieldIds": {
              "terms": {
                "field": "attributes.fieldId",
                "size": 0
              },
              "aggs": {
                "values": {
                  "terms": {
                    "field": "attributes.value",
                    "size": 0
                  }
                }
              }
            }
          }
        }
      }
    }
    

    What will it do?

    1. Run nested aggregation first in order to get into nested objects and aggregate them properly.
    2. Create buckets using terms aggregation for each fieldId, in your case we'll get two of them: 1 and 2.
    3. Run terms aggregation again for each of buckets above in order to get coresponding values.

    So that's the output.

    {
      "took": 2,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
      },
      "hits": {
        "total": 2,
        "max_score": 0,
        "hits": []
      },
      "aggregations": {
        "Nest": {
          "doc_count": 4,
          "fieldIds": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": 1,
                "doc_count": 2,
                "values": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "Male",
                      "doc_count": 2
                    }
                  ]
                }
              },
              {
                "key": 2,
                "doc_count": 2,
                "values": {
                  "doc_count_error_upper_bound": 0,
                  "sum_other_doc_count": 0,
                  "buckets": [
                    {
                      "key": "11/11/2015",
                      "doc_count": 1
                    },
                    {
                      "key": "12/11/2015",
                      "doc_count": 1
                    }
                  ]
                }
              }
            ]
          }
        }
      }
    }
    

    It's not precisely what you've requested. But that's closest what you can get in Elasticsearch.