elasticsearchluceneelasticsearch-dslelasticsearch-dsl-py

How can i do both search across all field and search with field specified in Elastic search?


I'm very new to elastic search, how do I write a query which search for a keyword (ie. test keyword) in all fields in the document, and one more keyword which search in a specific field.

this can be done using query_string but we can't do search in nested fields with nested field specified, So i'm using LUQUM to convert lucene query to Elasticsearch DSL.

Below is the sample usecase:

I have a mapping:

"mappings": {
    "properties": {
      "grocery_name":{
        "type": "text"
       },
      "items": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "text"
          },
          "stock": {
            "type": "integer"
          },
          "category": {
            "type": "text"
          }
        }
      }
    }
  }
}

and the data looks like below

{
  "grocery_name": "Elastic Eats",
  "items": [
    {
      "name": "Red banana",
      "stock": "12",
      "category": "fruit"
    },
    {
      "name": "Cavendish banana",
      "stock": "10",
      "category": "fruit"
    },
    {
      "name": "peach",
      "stock": "10",
      "category": "fruit"
    },
    {
      "name": "carrot",
      "stock": "9",
      "category": "vegetable"
    },
    {
      "name": "broccoli",
      "stock": "5",
      "category": "vegetable"
    }
  ]
}

How can I query to get all items where the item name matches banana from grocery_name: Elastic Eats ?

tried with * and _all, it didn't work.

example query:

{
   "query": {
        "bool": {
            "must": [
                {
                    "match_phrase": {
                        "grocery_name": {
                            "query": "Elastic Eats"
                        }
                    }
                },
                {
                    "match": {
                        "*": {
                            "query": "banana",
                            "zero_terms_query": "all"
                        }
                    }
                }
            ]
        }
    }
}

I'm sure I'm missing something obvious, but I have read the manual and I'm getting no joy at all.

UPDATE: enabling include_in_parent for nested object works for below query, but it will internally duplicates data which will definitely impact on memory.

{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "grocery_name": {
              "query": "Elastic Eats"
            }
          }
        },
        {
          "multi_match": {
              "query": "banana"
          }
        }
      ]
    }
  }
}

Is there any other way to do this?


Solution

  • You need to use a nested match query with inner_hits resulting in an inner nested query to automatically match the relevant nesting level, rather than root

    Search Query

     {
      "query": {
        "bool": {
          "filter": [
            {
              "term": {
                "grocery_name": "elastic"
              }
            },
            {
              "nested": {
                "path": "items",
                "query": {
                  "bool": {
                    "must": [
                      {
                        "match": {
                          "items.name": "banana"
                        }
                      }
                    ]
                  }
                },
                "inner_hits": {}
              }
            }
          ]
        }
      }
    }
    

    Search Result:

     "inner_hits": {
              "items": {
                "hits": {
                  "total": {
                    "value": 2,
                    "relation": "eq"
                  },
                  "max_score": 0.744874,
                  "hits": [
                    {
                      "_index": "stof_64273970",
                      "_type": "_doc",
                      "_id": "1",
                      "_nested": {
                        "field": "items",
                        "offset": 0
                      },
                      "_score": 0.744874,
                      "_source": {
                        "name": "Red banana",
                        "stock": "12",
                        "category": "fruit"
                      }
                    },
                    {
                      "_index": "stof_64273970",
                      "_type": "_doc",
                      "_id": "1",
                      "_nested": {
                        "field": "items",
                        "offset": 1
                      },
                      "_score": 0.744874,
                      "_source": {
                        "name": "Cavendish banana",
                        "stock": "10",
                        "category": "fruit"
                      }
                    }
                  ]
                }
    

    Update 1:

    On the basis of your comments, you can use multi match query, for your use case

    If no fields are provided, the multi_match query defaults to the index.query.default_field index settings, which in turn defaults to *.

    (*) extracts all fields in the mapping that are eligible to term queries and filters the metadata fields. All extracted fields are then combined to build a query.

    Search Query:

        {
          "query": {
            "bool": {
              "filter": [
                {
                  "term": {
                    "grocery_name": "elastic"
                  }
                },
                {
                  "nested": {
                    "path": "items",
                    "query": {
                      "bool": {
                        "must": [
                          {
                            "multi_match": {
                              "query": "banana"    <-- note this
                            }
                          }
                        ]
                      }
                    },
                    "inner_hits": {}
                  }
                }
              ]
            }
          }
        }
    

    Update 2:

    You need to use a combination of multiple bool queries like this:

    {
      "query": {
        "bool": {
          "must": [
            {
              "match_phrase": {
                "grocery_name": {
                  "query": "Elastic Eats"
                }
              }
            },
            {
              "bool": {
                "should": [
                  {
                    "bool": {
                      "must": [
                        {
                          "multi_match": {
                            "query": "banana"
                          }
                        }
                      ]
                    }
                  },
                  {
                    "bool": {
                      "must": [
                        {
                          "nested": {
                            "path": "items",
                            "query": {
                              "bool": {
                                "must": [
                                  {
                                    "multi_match": {
                                      "query": "banana"
                                    }
                                  }
                                ]
                              }
                            },
                            "inner_hits": {}
                          }
                        }
                      ]
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }