could anyone tell me why the proposal Id needs to be unique in Paxos? I think the reason why this the proposalId needs to be unique is that we need to use it to reject the old proposal and sort the max vote. So if we make the phase one: acceptor only accepts the proposal greater than the promisedId and it is incremental, it still can guarantee the consistency.
We assume proposer A makes a proposal (proposalId x, value y) to acceptors, then he got the majority reply with approval, another proposer B with the same proposal id(x) issue a proposal request, this proposer B will be rejected, right? And eventually, we still can reach consistency, right?
The short answer is that the proof of correctness depends on numbers being unique to each node at each round.
An intuitive answer is as follows:
If proposal numbers are not unique between nodes in the cluster you could have two nodes propose different values for the same number. Under arbitrary message loss some nodes may accept one value and some the other. A potential new leader will ask all nodes their highest accepted value. It will then get back responses that are different values for the same number. It cannot then disambiguate and decide which value is the one to choose next to bring the cluster to consistency. Using unique numbers ensures that each value has a unique number in each round. That ensures that a new leader can correctly pick the highest accepted value.
Contrived Scenario:
A five node cluster with node A
the leader. The network goes haywire. Node B
thinks it needs to lead as it suspects node A
is dead due to lost messages. Node B
proposes the same number last used by A
. Nodes A
and B
are actually both up and attempt to transmit a different value to the other nodes. The power fails and all nodes go dark and some but not all messages got through as the network got disrupted due to the rolling power failure.
Next the power comes back on but nodes A
and E
remain dead. Nodes B
, C
and D
can form a quorum. Node C
proposes a fresh high number and gets back two different values for the highest accepted value. One originates from A
and the other from B
. It now has to choose between them. Which value is guaranteed to bring the cluster to the correct consistency?
Imagine that in the surviving quorum only one node has the A
value but two nodes have the B
value. So let's say we guess B
s value but that could be wrong. Dead nodes A
and E
may have A
s value. Node A
happened to see a majority of responses for its value prior to the power failure as it's message happened to get through to two other nodes. The value was a payment and it sent the money out of the company when it saw a majority response. Yet the surviving quorum decided that A
s value never happened and bring the cluster to consistency with B
s value.
The Fix:
All you need to do is uniquely number each node and encode the node unique number into the least significant bits of the ballot numbers. Then each node uses numbers unique to it and can easily generate a new number just higher than the last highest number used within the cluster. If we did that only one value would be the highest accepted at any round.
Any new leader with a majority will then see the highest value used by the last leader if that last leader got a majority response. The new leader won't then contradict the last leader it will collaborate. The new leader doesn't need to know whether the dead leader actually saw a majority response and acted upon it. Rather it makes the conservative choice and assumes it may have and acts appropriately.