algorithmdistributeddistributed-systempaxosconsensus

Contradiction in Lamport's Paxos made simple paper


Phase 2. (a) If the proposer receives a response to its prepare requests (numbered n) from a majority of acceptors, then it sends an accept request to each of those acceptors for a proposal numbered n with a value v, where v is the value of the highest-numbered proposal among the responses, or is any value if the responses reported no proposals.

As mentioned in the paper,

A proposer issues a proposal by sending, to some set of acceptors, a request that the proposal be accepted. (This need not be the same set of acceptors that responded to the initial requests.)"

But as my understanding, if we change Phase 2. (a) to:

If the proposer receives a response to its prepare requests (numbered n) from a majority of acceptors, then it sends an accept request to an arbitrary set of majority acceptors for a proposal numbered n with a value v, where v is the value of the highest-numbered proposal among the responses, or is any value if the responses reported no proposals.

the algorithm will fail, following is an example. Consider that there are totally 3 acceptors ABC. We will use X(n:v,m) to denote the status of acceptor X: proposal n:v is the largest numbered proposal accepted by X where n is the proposal number and v is the value of the proposal, and m is the number of the largest numbered prepare request that X has ever responded.

  1. P1 sends 'prepare 1' to AB
  2. Both AB respond P1 with a promise to not to accept any request numbered smaller than 1. Now the status is: A(-:-,1) B(-:-,1) C(-:-,-)
  3. P1 receives the responses, then gets stuck and runs very slowly
  4. P2 sends 'prepare 100' to AB
  5. Both AB respond P2 with a promise to not to accept any request numbered smaller than 100. Now the status is: A(-:-,100) B(-:-,100) C(-:-,-)
  6. P2 receives the responses, chooses a value b and sends 'accept 100:b' to BC
  7. BC receive and accept the accept request, the status is: A(-:-,100) B(100:b,100) C(100:b,-). Note that proposal 100:b has been chosen.
  8. P1 resumes, chooses value a and sends 'accept 1:a' to BC
  9. B doesn't accept it, but C accepts it because C has never promise anything. Status is: A(-:-,100) B(100:b,100) C(1:a,-). The chosen proposal is abandon, Paxos fails.

Did I miss anything here? Thanks.


Solution

  • You missed something in step 7. When C processes accept 100:b it sets its state to C(100:b,100). By accepting a value the node is also promising to not accept earlier values.


    Update. I've been thinking about this all month because I knew the above answer was not absolutely correct.

    What's more I looked through several proprietary and open-source paxos implementations and they all had the bug submitted by the OP!

    So here's the correct answer, when viewed entirely from Paxos Made Simple:

    If the proposer receives a response to its prepare requests (numbered n) from a majority of acceptors, then it sends an accept request to each of those acceptors for a proposal numbered n with a value v, where v is the value of the highest-numbered proposal among the responses, or is any value if the responses reported no proposals. (emphasis mine)

    In other words, the proposer can only send Accept messages to acceptors that it has received Promises from for that ballot number.

    So, is this a contradiction in Lamport's paper? Right now, I'm saying yes.


    If you look at Lamport's paxos proofs he treats an accept as a promise, just as my original answer suggests. But this is not pointed out in Paxos Made Simple. In fact, it appears Lamport took great pains to specify that an accept was not a promise.

    The problem is when you combine the weaker portions of both variants; as the OP did and several implementations do. Then you run into this catastrophic bug.