I come across the following as part of docuementation
In Elasticsearch the key to good performance is to de-normalize your data into documents
And also,
the restriction about, where both the child and parent documents must be on the same shard
Given a scenario of multilevel hiearchy( grandparent --> parent ---> child ), where some of the parents have more childern than other and data might be skewed and few shards contain exponetially larger data than other shards.
What are the best practises with respect to gain more performance ?
Is it a good idea to put all the hiearchy in a single document ( rather than one document for each level). The parent data might be redudant if there are more childern as the parent data need to be copied to all the documents ?
Yes, both the statements which you mentioned are correct, and let me answer your both question in the context of your use-case.
Answer: In general, if you have all the data in a single document searching, definitely searching will be much faster and that's the whole reason for denormalizing the data in databases which is also mentioned in the first statement, as you don't have to create multiple workers thread and combine the results from multiple documents/shards/nodes. also storage is cheap and although it will save the storage cost but save the computing cost(costlier than storage). in short, if you are worried about query performance than de-normalizing your data will give it a major boost.
Answer: if you still go ahead with the normalization approach, then as mentioned you should keep all the related docs in the same shard and should implement custom routing to achieve that.