elasticsearchelastic-stack

Hashmap of objects in elasticsearch


I know that Elasticsearch supports object and nested field type. According to my understanding they represent an individual object and an array of objects respectively. However, is it possible to have a field which is a hash-map of objects?

My use case is as follows -

Currently, I am using the nested field type. The problem with that is for finding the target object I am having to loop through the entire array, make the change and then re-index. This will become expensive as my field starts having more and more objects

I know that I can do an _update_by_query using ctx but even that seems to loop through the field to find the target object.

Is there something that I am missing.

PS - I am a beginner in Elasticsearch


Solution

  • Update by query is dealing with source, which is stored in elasticsearch in form of json. When this source is getting loaded into elasticsearch for update it is parsed into a map. So, essentially you already have everything loaded into hashmap. You could theoretically replace

    {
      "my_objects": [{
        "id": "abc",
        "value": "foo"
      }, {
        "id": "xyz",
        "value": "foo"
      }]
    }
    

    with something like this:

    {
      "my_objects": {
        "abc": {
          "value": "foo"
        },
        "xyz": {
          "value": "foo"
        }
      }
    }
    

    And get everything in a hashmap form. However, unless my_object is flattened field or a disabled object such approach will very quickly cause a mapping explosion, assuming you don't have a very limited set of ids that gets repeated over and over again. In other words, this approach is only applicable in cases when you don't need to search these nested objects or you have a limited number of ids.

    You could also, theoretically, load this objects into another hashmap in update by query script and then load them back into source, but it makes little sense since it will be only longer.

    So, I think if you want to stick with nested objects linear search of these objects is the best you can do, unless you want to sort them in source and do a binary search. Saying this, if the number of embedded objects is constantly growing and you are constantly modifying them than maybe nested objects are not the right choice here. I would look into join field. It provides similar relationship functionality but allows you to modify main records and records representing current nested objects independently.