javaelasticsearchelasticsearch-high-level-restclient

Elasticsearch document search by quafilication


My problem is very similar to this one: Elasticsearch : search document with conditional filter However, the data structure is a bit different for me, so I can't use the solution for the other thread. I have many documents indexed. There are, as we call it, qualifiers telling me if certain documents need to be shown or not. Here is the moment where my problems start. Here's the example:

{
    locale: en_US, 
    type: bla,
    qualifiers: [
         {
             criteria: [
                {
                    type: year_range,
                    lower: 2024,
                    upper: 2027
                },
                {
                    type: ids,
                    values: [1,20]
                }
                ,
                {
                    type: string_range_term,
                    lower: "123455",
                    upper: "zzzzzz"
                }
            ]
        },
        {
             criteria: [
                {
                    type: year_range,
                    lower: 2010,
                    upper: 2012
                }
            ]
        }
    ]
}

As an input, I need to provide all the parameters: year, ids, expiration date. The document needs to be matched in the following way:

examples:

  1. input year: 2011 id:10 term: aaaa
  1. input year: 2013 id: 1 term: zzzzzz
  1. input year: 2025 id: 20 term: zzzzzz
  1. input: year: 2025 id: 50 term: zzzzzz

I'll be grateful for any hints or advice as I've been struggling with that for 4 days now without promising results. I'm thinking if I should reorganize the data - as that's almost 1:1 with the database structure - where it works just fine... however, I need to speed the document selection a bit so I wanted to move them to ES. I'll need to implement that within the Java using the rest client... but having the proper query in place I'll be able to convert that into the Java code :) thank you in advance


Solution

  • You can use elasticsearch boolean query and create some AND logic inside of the OR logic. The field type must be nested because of the raw data you have. Here is how you can do it with must and should clauses.

    GET search_by_quafilication/_search
    {
      "query": {
        "nested": {
          "path": "qualifiers",
          "query": {
            "bool": {
              "should": [    <-- OR logic start
                {
                  "bool": {
                    "must": [  <-- first AND logic inside of OR
                      {}
                    ]
                  }
                },
                {
                  "bool": {
                    "must": [  <-- second AND logic inside of OR
                      {}
                    ]
                  }
                }
              ]
            }
          }
        }
      }
    }
    

    The full example:

    PUT search_by_quafilication
    {
      "mappings": {
        "properties": {
          "qualifiers": {
            "type": "nested"
          }
        }
      }
    }
    #check the mapping
    GET search_by_quafilication
    PUT search_by_quafilication/_doc/1
    {
      "qualifiers": {
        "year": 2011,
        "ids": 10,
        "string": "aaaa"
      }
    }
    
    PUT search_by_quafilication/_doc/2
    {
      "qualifiers": {
        "year": 2013,
        "ids": 1,
        "string": "zzzzzz"
      }
    }
    
    PUT search_by_quafilication/_doc/3
    {
      "qualifiers": {
        "year": 2025,
        "ids": 20,
        "string": "zzzzzz"
      }
    }
    
    PUT search_by_quafilication/_doc/4
    {
      "qualifiers": {
        "year": 2025,
        "ids": 50,
        "string": "zzzzzz"
      }
    }
    
    
    
    
    GET search_by_quafilication/_search
    {
      "query": {
        "nested": {
          "path": "qualifiers",
          "query": {
            "bool": {
              "should": [
                {
                  "bool": {
                    "must": [
                      {
                        "range": {
                          "qualifiers.year": {
                            "gte": 2024,
                            "lte": 2027
                          }
                        }
                      },
                      {
                        "regexp": {
                          "qualifiers.string": "[A-z]{6}"
                        }
                      },
                      {
                        "range": {
                          "qualifiers.ids": {
                            "gte": 1,
                            "lte": 20
                          }
                        }
                      }
                    ]
                  }
                },
                {
                  "bool": {
                    "must": [
                      {
                        "range": {
                          "qualifiers.year": {
                            "gte": 2010,
                            "lte": 2012
                          }
                        }
                      }
                    ]
                  }
                }
              ]
            }
          }
        }
      }
    }
    

    The ouptut will only hit the example you shared that is id:1 and id:3

    enter image description here