elasticsearchelasticsearch-indices

Elasticsearch: Reindexing doubled the size of an index


I just did a full reindex from a dump of a previous index but the newly created index is double the size of a previous one even before it indexed all the documents. What could be the reason?

The previous index was 3.7gb and the new is 7gb.

Update: It has now come down to 5.2gb (probably due to segments merge) but as you can see it is still larger than the previous index which is 3.7gb

enter image description here

Here's the shards output for both the indices: enter image description here


Solution

  • The reason for the differences between old and new index sizes is because of the unassigned shards.

    GET _cat/shards/index_name_1,index_name_2?v
    

    The above API call shows that there are some unassigned shards for a small index. Unassigned shards are affecting the store.size. The store.size is the sum of all shards sizes. If shards are unassigned it won't be calculated.

    The pri.store.sizes and store.size have different sizes for the big index. This means one of the replicas of the big index is allocated and 2 replicas of the small index remain unassigned.

    You can check why the shards are unassigned with the following API call.

    GET _cluster/allocation/explain
    

    Elasticsearch will retry 5 times to allocate the shards. If it's failed 5 times there won't be any automatic process to allocate those shards. You can force to allocate the shards with the following API call.

    POST _cluster/reroute?retry_failed=true
    

    Please note that, if you are struggling with disk watermark, e.g insufficient disk space, the allocation process will be failed again. You can have more disk space by removing the old indices or removing the old Elasticsearch logs etc.