filternestedelasticsearch-queryelasticsearch-6

Elasticsearch 6.7 - How to filter a nested array field that must contain given array items


I am trying to filter search results by a field which is in a nested array object in elasticsearch (version 6.7.1).

Specifically trying to only return results that contain all terms in filter for a single field (typically this would be stored as an array of objects), in elasticsearch this has been stored as a nested type, see data structure below.

Mappings:

{
  ...
  "mappings": {
    "doc": {
      "properties": {
        ...
        "dimensions": {
          "type": "nested",
          "properties": {
            ...
            "name": {
              "fields": {
                "raw": {
                  "analyzer": "raw_analyzer",
                  "type": "text",
                  "index_options": "docs",
                  "norms": false
                }
              },
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

Example documents:

Doc 1:

{
  ...
  "dimensions" : [
    {
      "label" : "time",
      "name" : "time"
    },
    {
      "label" : "geography",
      "name" : "geography"
    },
    {
      "label" : "statistics",
      "name" : "statistics"
    }
  ]
}

Doc 2:

{
  ...
  "dimensions" : [
    {
      "label" : "time",
      "name" : "time"
    },
    {
      "label" : "geography",
      "name" : "geography"
    },
    {
      "label" : "Age groups",
      "name" : "agegroups"
    }
  ]
}

Query is:

{
  "query": {
    "bool": {
      "filter": [
        {
          "nested": {
            "path": "dimensions",
            "query": [
              {
                "terms": {
                  "dimensions.name": [
                    "time",
                    "statistics"
                  ],
                }
              }
            ]
          }
        }
      ]
    }
  }
}

This query returns both docs 1 and 2 due to successfully matching at least one filter term in the dimensions.name.

What I would like to achieve is to only return doc 1 as it matches all terms. The logic is that if 1 or more filter terms for the dimensions.name field is missing to NOT return the document.

I have tried many variations of the above query using minimum_should_match, execution but I believe these were for older versions of elasticsearch.

Also I am wondering whether I should be modelling the data in a better way to achieve this.


Solution

  • You could achieve the same by making a must query instead filter

    {
      "query": {
        "bool": {
          "must": [
            {
              "nested": {
                "path": "dimensions",
                "query": [
                  {
                    "term": {
                      "dimensions.name": "time"
                    }
                  }
                ]
              }
            },
            {
              "nested": {
                "path": "dimensions",
                "query": [
                  {
                    "term": {
                      "dimensions.name": "statistics"
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
    

    As per the doc, you are matching for a term, so it should be part of query context. So must do the job for query context.