elasticsearchelasticsearch-aggregationelasticsearch-painless

elasticsearch aggregate array of strings


I need an aggregation query to get a bucket with all my root folders. All documents in my elasticsearch have a field named path where I store an array with the paths where the document is located ( e.g. path=[1.3., 1.2.4, 5., 11] ).

If I use the normal terms aggregation

"terms": {
    "field": "path.keyword"
}

I unfortunately get all unique paths:

"buckets" : [
    {
      "key" : "1.3."
      "doc_count" : 6
    },
    {
      "key" : "11."
      "doc_count" : 3
    },
    {
      "key" : "5."
      "doc_count" : 3
    },
    {
      "key" : "1.2.4."
      "doc_count" : 1
    }
]

I've tried to solve it using a painless script

"terms": {
    "script": "doc['path.keyword'].value.substring(0, doc['path.keyword'].value.indexOf('.')  )"
}

but then I only get the last elements of my path array

"buckets" : [
    {
      "key" : "1",
      "doc_count" : 7
    },
    {
      "key" : "11",
      "doc_count" : 3
    }
]

how do I only get the root folders?


Solution

  • Using doc["field"].value will give single string of all values in the field. In script you need to return array of values with root value i.e iterate through all the elements of field and return array of substring.

    Sample Data:

    "hits" : [
          {
            "_index" : "index84",
            "_type" : "_doc",
            "_id" : "yihhWnEBHtQEPt4DqWLz",
            "_score" : 1.0,
            "_source" : {
              "path" : [
                "1.1.1",
                "1.2",
                "2.1.1",
                "12.11"
              ]
            }
          }
        ]
    

    Query

    {
      "aggs": {
        "root_path": {
          "terms": {
            "script": {
              "source": "def firstIndex=0;def path=[]; for(int i=0;i<doc['path.keyword'].length;i++){firstIndex=doc['path.keyword'][i].indexOf('.'); path.add(doc['path.keyword'][i].substring(0,firstIndex))} return path;"
            }
          }
        }
      }
    }
    

    Result:

    "aggregations" : {
        "root_path" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "1",
              "doc_count" : 1
            },
            {
              "key" : "12",
              "doc_count" : 1
            },
            {
              "key" : "2",
              "doc_count" : 1
            }
          ]
        }
      }