kubernetesdistributed-systemetcdraftcap-theorem

How is ETCD a highly available system, even though it uses Raft which is a CP algorithm?


This is from Kubernetes documentation:

Consistent and highly-available key value store used as Kubernetes' backing store for all cluster data.

Does Kubernetes have a separate mechanism internally to make ETCD more available? or does ETCD use, let's say, a modified version of Raft that allows this superpower?


Solution

  • When it comes to going into etcd details, it is best to use the official etcd documentation:

    etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. It gracefully handles leader elections during network partitions and can tolerate machine failure, even in the leader node.

    There is no mention here that this is high-availability. As for the fault tolerance, you will find a very good paragraph on this topic here:

    An etcd cluster operates so long as a member quorum can be established. If quorum is lost through transient network failures (e.g., partitions), etcd automatically and safely resumes once the network recovers and restores quorum; Raft enforces cluster consistency. For power loss, etcd persists the Raft log to disk; etcd replays the log to the point of failure and resumes cluster participation. For permanent hardware failure, the node may be removed from the cluster through runtime reconfiguration.

    It is recommended to have an odd number of members in a cluster. An odd-size cluster tolerates the same number of failures as an even-size cluster but with fewer nodes.

    You can also find very good article about understanding etcd:

    Etcd is a strongly consistent system. It provides Linearizable reads and writes, and Serializable isolation for transactions. Expressed more specifically, in terms of the PACELC theorem, an extension of the ideas expressed in the CAP theorem, it is a CP/EC system. It optimizes for consistency over latency in normal situations and consistency over availability in the case of a partition.

    Look also at this picture:enter image description here