elasticsearchreplicationdatabase-replicationdisk-partitioning

Why does Elasticsearch allow you to specify multiple disk partitions in the .yml file if it doesn't balance shards across partitions?


This is follow-up to a questions I asked previously here.

I have a cluster with three data nodes and one head node. The hard-drive on each data node has three partitions: /data1, /data2 and /data3. I configured my elasticsearch.yml on the head node like this:

path.data: /data1/elasticsearch, /data2/elasticsearch_2, /data3/elasticsearch_3

My existing index is stored in /data1/elasticsearch on each node. However, when I disable replication and try to load the data for my new index I trigger the low watermark cluster setting; the /data1 doesn't have enough space.

Looking through the Elasticsearch documentation I found this warning:

Elasticsearch does not balance shards across a node’s data paths. High disk usage in a single path can trigger a high disk usage watermark for the entire node. If triggered, Elasticsearch will not add shards to the node, even if the node’s other paths have available disk space. If you need additional disk space, we recommend you add a new node rather than additional data paths.

So my questions is: Why does Elasticsearch allow you to specify multiple paths for data storage if it doesn't allocate shards to the next empty path on the node?


Solution

  • The option to use multiple data paths won't be allowed anymore, this feature has some problems, for example the one you mentioned and the fact the kibana could show the wrong free space when using multiple disks on the same node.

    The use of multiple data paths is planned to be deprecated in version 7.13 and removed in version 8.0 according to this github issue.

    According to the same issue:

    (...) multiple-data-paths is a feature that has a very high cost (numerous bugs and deficiencies in its design), relatively few users, and most importantly, better alternatives that are standard outside of Elasticsearch.