distributed-computingconsensusraft

Raft consensus with a shared log: good or bad idea?


Raft consensus protocol requires nodes to have a replicated log, and all the implementations that I'm aware of require that every node has a durable local storage to keep the log. In cloud-native environment, such as Kubernetes, nodes are easier to manage if they don't have any persistent volumes because they could be deployed as a plain Deployment, whilst persistent volumes require a StatefulSet which have some additional complexities and limitations.

I was wondering if Raft can be successfully used in a scenario where the nodes share a common log which they can append to and read from, and which can assign a monotonously increasing record id. This can be easily achieved with a Kafka topic, but other options are possible.

What would be the implications of doing this? In other words, I'd like to understand the trade-offs for choosing to implement Raft consensus with a shared log as opposed to a replicated log.

To illustrate how this might work, here is a sketch of how a replicated key-value store based on Raft would work with a shared log (in brackets are details of a Kafka topic-based implementation):

  1. Initial leader election progresses as per Raft protocol
  2. Elected leader will append Put(K,V) transactions it receives from clients to the shared log (publish to the Kafka topic).
  3. Once the leader receives the acknowledgement from the log it records unique message id (Kafka topic offset and partition) and sends it to the followers. Followers record the latest message id and acknowledge it.
  4. Once the majority of followers acknowledge the receipt of message id, the leader updates its local in-memory key-value store, and acknowledges the transaction to the client.
  5. Followers asynchronously process new Put(K,V) messages from the log (consume from the Kafka topic) and update their in-memory key-value stores.
  6. If leader stops responding, another follower gets elected as a leader. Only followers having latest message id (Kafka topic offset) can be elected to make sure that the new leader does not have stale data.
  7. If a follower crashes and restarts, it loses information about last message id, but it can read the log and reconstruct its state (consume all records from the Kafka topic).

Followers can be used to serve clients' Get(K) requests. If eventual consistency is acceptable, then any follower can respond to the client as long as they have processed the message corresponding to the latest id they received from the leader. If read-your-writes is required, then the client can store the message id it receives from the leader and send it to the follower. In such case, the follower will block until it has received and processed a corresponding message from the log.


Solution

  • "Raft consensus protocol requires nodes to have a replicated log" - this is actually other way around, raft protocol exists to make that log possible. If one has a way to get a linear log - there is no need in raft.

    The goal for raft (or other consensus protocols) is to agree on something. In raft case, several completely independent (no resource sharing) nodes may agree on a stream of events. Hence, if a system already has a tool to get such stream, then no raft is needed at all. In kafka example, every client could write to kafka stream directly, no need to have a dedicated leader.

    Fun fact: Kafka indirectly uses a consensus algorithm under the hood to manage availability. See details here https://kafka.apache.org/documentation/#design_replicatedlog (part where they talk about configuration being managed by consensus)