consensusraftdistributed-databasepaxosdata-consistency

The relationship between Paxos family and data consistency


Paxos, a kind of consensus algorithm, plays a vital role in distributed database systems. It can be used to make the distributed system chooses the same proposal. Data consistency is a big problem in the database system. Others consisder that Paxos or raft is used in many distributed systems to ensure data consistency. I can't agree with them.

My first question is whether log append is just a Paxos implementation. We use the same log to ensure consistency. Is log replication or data replication the easiest way to keep data consistent? Is network transmission the significant difference between them?

Can we treat the Paxos family as a consistency algorithm? If not, how can we ensure data consistency in a distributed system? (In my opinion, Paxos is just a kind of consensus algorithm, not a consistent algorithm)

In a distributed system, when and why will we use Paxos? Whether we use Paxos for achieving high system availability.

I would be very grateful indeed for any help you could give me.


Solution

  • Consensus algs (paxos, raft, etc) allow a set of nodes to agree on something; for example, nodes may agree on a sequence of log events. If there is a sequence of event e1, e2, ..., en - then all nodes in a cluster running a consensus algorithm will EVENTUALLY get the same sequence. We call this sequence "log".

    From a node in a consensus-run cluster, the log is an eventually consistent data structure. Eventual consistency comes from the fact, that any particular node can be partitioned away from the cluster - when that node will get back, the node will eventually receive all events for the log (in the same order).

    We can use the log as a part of a database system - every mutation of data (insert, update, delete) will be first written to the log, and then applied to some local copy of data for optimization. So if a client will query a database on one of these nodes, the operation is still eventually consistent - as it is the same case as in previous paragraph.

    Let's define what strong consistency is. There are several definitions, the one I choose to use: after an update is done, every following read will see the updated data.

    Clusters running a consensus protocol may be used as strongly consistent system. To achieve that, these systems use "linearizable reads". It is easier to explain how that works using an example:

    1. let's say we have a cluster of three nodes running raft; node A is the leader and B, C are followers
    2. an update comes to A: "set variable x to 1"
    3. A will use consensus protocol to get the majority to agree on x=1
    4. when a client wants to read the value of x; they will have to use linearizable read for that (instead of going to any node and ask for the value directly)
    5. the client will send to A "read value of X"
    6. as usual, A will send this event to B, C; the goal is to achieve a consensus that it is ok to write this read event into the log
    7. after the event is accepted by majority, A can conclude that they have the latest view of the data - A is sure that between events "x=1" and "read value of x" there are no other events which affect the x; hence the value of x is known
    8. A would return x=1 to the client

    The example from above is strongly consistent - after x=1 is accepted by the system as a whole, every next read will get that value. This approach makes helps achieving strongly consistent systems with high availability.

    Back to your specific questions:

    My first question is whether log append is just a Paxos implementation. We use the same log to ensure consistency. Is log replication or data replication the easiest way to keep data consistent? Is network transmission the significant difference between them?

    Paxos/raft/etc as your said earlier are just protocols to allow nodes to agree on something. We can use that log to build a strongly consistent data storage using linearizable reads.

    Can we treat the Paxos family as a consistency algorithm? If not, how can we ensure data consistency in a distributed system? (In my opinion, Paxos is just a kind of consensus algorithm, not a consistent algorithm)

    Yes, paxos is a consensus algorithm. It could be used to build a system with desired level of consistency. But this is not the only application. Classical example would be a leader election - where nodes have to agree(build consensus) on who is the leader.

    In a distributed system, when and why will we use Paxos? Whether we use Paxos for achieving high system availability.

    There are several popular algs: raft, zab. For example, ZooKeeper is using ZAB. They use it for high availability and consensus. But in general, these algs are used when a system has several nodes and these nodes need to agree on something.