elasticsearch

Cost of adding field mapping in elasticsearch type


I have a use-case, where I have got a set of predefined fields and also need to support adding dynamic fields to ElasticSearch with some basic searching on them. I am able to achieve this using dynamic template mapping. However, the frequency of adding such dynamic fields is quite high.

Consider the this ES document for the Event type:

{
    "name":"Youth Conference",
    "venue":"Ahmedabad",
    "date":"10/01/2015",
    "organizer":"Invincible",
    "extensions":{
        "about": {
            "vision":"Visualizes the image of an ideal Country. ",
            "mission":"Encapsulates the gravity of the top reformative solutions for betterment of Country."
        }
    // Any thing can go here..
    }

}

In the example above, each event document may have any unknown/new fields. Hence, for every such new dynamic field introduced, ES will update the mapping of the type. My concern is what is the cost of adding new field mapping in the existing type?

I am planning to separate out all dynamic mappings(inside extensions) from Event type by introducing another type, say EventExtensions and using parent/child relationship to map it with Event type. I believe this may limit the cost(if any) of adding dynamic fields frequently to the type. However, to my knowledge, using parent/child relationship will need more memory.


Solution

  • The first thing to remember here is that field is per index and not per type. So wherever you add new fields , it would be made in the same index. Be it , in another type or as parent or child. So decoupling the new fields to another type but same index is not going to make any change.

    Second field addition is not that very expensive thing. I know people who uses 1000 of fields and are good with it. That being said , there should be a tab on number of field so that it wont go out to crazy numbers.

    Here we have multiple approaches to solve the problem

    1. Lets assume that the new field data need not be exactly searchable. In this case , you can deserialize the entire JSON as a string and add it to a field. Also make sure this field is not indexed. This way you can search based on other fields but then on retrieval of the document , get the information that was deserialized.

    2. Lets say the new field looks like this

            {
               "newInfo1" : "log Of Info",
               "newInfo2" : "A lot more info"
            }
    

    Instead of this , you can use

    {
             "newInfo" : [
                 {
                    "fieldName" : "newInfo1",
                    "fieldValue" : "log Of Info"
                 },
                 {
                    "fieldName" : "newInfo2",
                    "fieldValue" : "A lot more info"
                 }
             ]
        }
            
    

    This way , fields wont increase. But then to make field level specific search , like give me all documents with filedName as newInfo2 and having the word more in it , you will need to make newInfo field nested.

    Hope this helps.