My company uses an out of the box software, and that software export logs to Elasticsearch (and uses these logs). The software create an index per day for every data type, for example: "A" record data => A_Data_2022_12_13, A_Data_2022_12_14 and so on.. Because this data storing method our Elastic has thousands of shards for 100GB of data. I want to merge all those shards into a small amount of shards, 1 or 2 for every data type.
I thought about reindex, but I think it is overkill for my purpose, because I want the data to stay the same as it is now, but merged into one shard.
What is the best practice to do it? Thanks!
I tried reindex, but it takes a lot of time, and I think it is not the right solution.
Too many shards can cause over-heap usage. Unbalanced shards can cause hot spots in clusters. Your decision is true and you should combine small indices into one or multiple indexes. Thus, you will have more stable shards, that is, a more stable cluster.
What you can do?
Note: You can tune the reindex speed with slice and set the number_of_replicas to 0.