elasticsearchelasticsearch-performance

Elastic Search - Joins best practises


I come across the following as part of docuementation

 In Elasticsearch the key to good performance is to de-normalize your data into documents

And also,

the restriction about, where both the child and parent documents must be on the same shard

Given a scenario of multilevel hiearchy( grandparent --> parent ---> child ), where some of the parents have more childern than other and data might be skewed and few shards contain exponetially larger data than other shards.

  1. What are the best practises with respect to gain more performance ?

  2. Is it a good idea to put all the hiearchy in a single document ( rather than one document for each level). The parent data might be redudant if there are more childern as the parent data need to be copied to all the documents ?


Solution

  • Yes, both the statements which you mentioned are correct, and let me answer your both question in the context of your use-case.

    1. Is it a good idea to put all the hierarchy in a single document (rather than one document for each level). The parent data might be redundant if there are more children as the parent data need to be copied to all the documents?

    Answer: In general, if you have all the data in a single document searching, definitely searching will be much faster and that's the whole reason for denormalizing the data in databases which is also mentioned in the first statement, as you don't have to create multiple workers thread and combine the results from multiple documents/shards/nodes. also storage is cheap and although it will save the storage cost but save the computing cost(costlier than storage). in short, if you are worried about query performance than de-normalizing your data will give it a major boost.

    1. What are the best practices with respect to gain more performance?

    Answer: if you still go ahead with the normalization approach, then as mentioned you should keep all the related docs in the same shard and should implement custom routing to achieve that.