elasticsearchrequestelasticsearch-py

aggregation (many value in one field) elasticsearch


I have many values in one field, when I do an aggregations, I receive these values as separate values.

Exemple :

name : jess , Region : new york 
name : jess , Region : poland

request :

  query = {
        "size": total,
        "aggs": {
        "buckets_for_name": {
            "terms": {
                 "field": "name",
                 "size": total
             },
            "aggs": {
                "region_terms": {
                    "terms": {
                        "field": "region",
                        "size": total
                    }
                }
            }
        }
        }
    }

with response["aggregations"]["buckets_for_name"]["buckets"] i get :

 {'key': 'jess ', 'doc_count': 61, 'region_terms': {'doc_count_error_upper_bound': 0, 'sum_other_doc_count': 0, 'buckets': [{'key': 'oran', 'doc_count': 60}, {'key': 'new ', 'doc_count': 1}, {'key': 'york', 'doc_count': 1}]}}, {'key': 'jess ', 'doc_count': 50, 'egion_terms': {'doc_count_error_upper_bound': 0, 'sum_other_doc_count': 0, 'buckets': [{'key': 'poland', 'doc_count': 50}]}}

with

pretty_results = []
for result in response["aggregations"]["buckets_for_name"]["buckets"]:
    d = dict()
    d["name"] = result["key"]
    d["region"] = []
    for region in result["region_terms"]["buckets"]:
        d["region "].append(region ["key"])
        pretty_results.append(d)
        print(d)

i get :

{'name': 'jess ', 'region ': ['new' , 'york', 'poland']}

I want to get this result:

{'name': 'jess ', 'region ': ['new york', 'poland']}

Solution

  • The region (and I presume name) fields were analyzed using the standard analyzer which rendered new york to be split into the tokens [new, york].

    What you may want to do is set up a keyword mapping to treat the strings as standalone tokens:

    PUT regions
    {
      "mappings": {
        "properties": {
          "name": {
            "type": "text",
            "fielddata": true,
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "region": {
            "type": "text",
            "fielddata": true,
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
    

    After that, perform your aggs on the .keyword fields:

    {
      "size": 200,
      "aggs": {
        "buckets_for_name": {
          "terms": {
            "field": "name.keyword",         <---
            "size": 200
          },
          "aggs": {
            "region_terms": {
              "terms": {
                "field": "region.keyword",   <---
                "size": 200
              }
            }
          }
        }
      }
    }
    

    If you want to keep newyork space-less, look into the pattern_replace filter within your analyzers.


    EDIT from the comments Aggs are not a part of the query -- they have their own scope -- so change this

    {
      "query": {
        "aggs": {
          "buckets_for_name": {
    

    to this

    {
      "query": {
         // possibly leave the whole query attribute out
       },
       "aggs": {
          "buckets_for_name": {
       ...