elasticsearchelasticsearch-aggregationelasticsearch-dsl

How to aggregate elasticsearch results based on concrete value of another aggregation?


i've problems to correctly filter my data based on previously aggregated result.

I've the following mapping for my elasticsearch data:

"properties": {
  "day": {
    "type": "date",
    "format": "yyyy-MM-dd"
  },
  "persons": {
    "type": "nested",
    "properties": {
      "id": {
        "type": "integer"
      },
      "name": {
        "type": "text",
        "analyzer": "final",
        "fields": {
          "raw": {
            "type": "keyword",
            "index": true
          }
        }
      },
      "role": {
        "type": "keyword"
      },
    }
  }
}

Let's say i save the following data in it:

{
  "workshops": [
    {
      "day": "2023-12-01",
      "persons": [
        {
          "name": "Max",
          "role": "infrastructure"
        },
        {
          "name": "Sam",
          "role": "infrastructure"
        },
        {
          "name": "Sam",
          "role": "database"
        },
        {
          "name": "Peter",
          "role": "network"
        },
        {
          "name": "Peter",
          "role": "infrastructure"
        }
      ]
    },
    {
      "day": "2023-12-02",
      "persons": [
        {
          "name": "George",
          "role": "infrastructure"
        },
        {
          "name": "Michael",
          "role": "database"
        },
        {
          "name": "Michael",
          "role": "network"
        },
        {
          "name": "Sam",
          "role": "administration"
        }
      ]
    }
  ]
}

And i've the following aggregation:

{
  "aggregations": [
    {
      "persons": {
        "nested": {
          "path": "persons"
        },
        "aggs": {
          "nestedData": {
            "terms": {
              "field": "persons.name.raw"
            },
            "aggs": {
              "name": {
                "terms": {
                  "field": "persons.name.raw"
                }
              }
            }
          }
        }
      }
    },
    {
      "persons_role_de_de": {
        "nested": {
          "path": "persons"
        },
        "aggs": {
          "nestedData": {
            "terms": {
              "field": "persons.role"
            },
            "aggs": {
              "name": {
                "terms": {
                  "field": "persons.role"
                }
              }
            }
          }
        }
      }
    }
  ]
}

When I just do a plain query (match_all) and filter for nothing I get the expected results:

What I now want to get is the following. First I filter for "Sam":

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "persons",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "persons.name": "sam"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

and get the following aggregations on the roles, basically the same because "Sam" is part of both workshops:

But what I really want to get is the following:

Because that are the roles "Sam" has, i'm not interested in the other roles of the other persons. My target is to get the only one workshop were "Sam" has the role "database". But I don't get the aggregation correct and the query which has to follow after selecting a role.


Solution

  • The query you've written is aggregating all the roles across all the documents, not just for the person named "Sam". To get the roles specifically for "Sam", you need to use a filter aggregation inside the nested aggregation. Here's how you can modify your query:

    In your case, the query is correctly filtering the documents where the name is "Sam", but the aggregations are being applied to all documents that match the query, not just the nested documents where the name is "Sam".

    This is because the scope of the query and the aggregations are different in nested. The query operates on the top-level documents, while the aggregations operate on the nested documents.

    In the modified query I provided below, the "sam_filter" filter aggregation ensures that only the nested documents where the name is "Sam" are included in the "nestedData" terms aggregation. This gives you the count of each role specifically for "Sam".

    Note: I used ES v8 during tests, modify the query according to your needs.

    #the mapping (workshops nested type added)
    PUT sam2
    {
      "mappings": {
        "properties": {
          "workshops": {
          "type": "nested",
          "properties": {
            
              "day": {
                "type": "date",
                "format": "yyyy-MM-dd"
              },
              "persons": {
                "type": "nested",
                "properties": {
                  "id": {
                    "type": "integer"
                  },
                  "name": {
                    "type": "text",
                    "fields": {
                      "raw": {
                        "type": "keyword",
                        "index": true
                      }
                    }
                  },
                  "role": {
                    "type": "keyword"
                  }
                }
              }
            }
          }
        }
      }
    }
    

    #sample document
    PUT sam2/_doc/1
    {
      "workshops": [
        {
          "day": "2023-12-01",
          "persons": [
            {
              "name": "Max",
              "role": "infrastructure"
            },
            {
              "name": "Sam",
              "role": "infrastructure"
            },
            {
              "name": "Sam",
              "role": "database"
            },
            {
              "name": "Peter",
              "role": "network"
            },
            {
              "name": "Peter",
              "role": "infrastructure"
            }
          ]
        },
        {
          "day": "2023-12-02",
          "persons": [
            {
              "name": "George",
              "role": "infrastructure"
            },
            {
              "name": "Michael",
              "role": "database"
            },
            {
              "name": "Michael",
              "role": "network"
            },
            {
              "name": "Sam",
              "role": "administration"
            }
          ]
        }
      ]
    }
    

    #all results
    GET sam2/_search
    {
      "size": 0,
      "query": {
        "nested": {
          "path": "workshops.persons",
          "query": {
            "term": {
              "workshops.persons.name.raw": "Sam"
            }
          }
        }
      },
      "aggs": {
        "persons": {
          "nested": {
            "path": "workshops.persons"
          },
          "aggs": {
            "nestedData": {
              "terms": {
                "field": "workshops.persons.name.raw"
              }
            }
          }
        },
        "persons_role_de_de": {
          "nested": {
            "path": "workshops.persons"
          },
          "aggs": {
            "nestedData": {
              "terms": {
                "field": "workshops.persons.role"
              }
            }
          }
        }
      }
    }
    

    #filter added inside of the nested aggs. GET sam2/_search { "size": 0, "aggs": { "persons": { "nested": { "path": "workshops.persons" }, "aggs": { "sam_filter": { "filter": { "term": { "workshops.persons.name.raw": "Sam" } }, "aggs": { "nestedData": { "terms": { "field": "workshops.persons.role" } } } } } } } }

    enter image description here