raft

Lecture 6: Fault Tolerance: Raft (1) MIT 6.824: Distributed Systems


I am watching Lecture 6: Fault Tolerance: Raft (1) MIT 6.824: Distributed Systems. At 1:16:25 Prof is discussing scenarios which are possible. Video Link: [https://youtu.be/64Zp3tzNbpE?t=4585] According to him, the following scenario looks possible.

    10   11   12   13   14   15
S1: 3
S2: 3    3    4
S3: 3    3    5

Explanation:

  1. At 10, all servers were working in term 3.
  2. Then at 11, S1 crashes, still S2 & S3 are in majority, so log gets replicated for 11
  3. Then at 12, S3 crashes, S2 becomes leader with new term 4, writes in the log, but before replication it crashes.
  4. Next S3 becomes leader again for term 5 and writes to its log.

I believe this is not possible, because S3's last known term is 3, so it cannot jump to term 5 for new election, because election at term 4 only S2 knows about it.

Trying to understand how this scenarios is possible.


Solution

  • Around 1:15:50 there is a tiny explanation.

    We are in a good state at log position 11 :

    Now S3 goes offline just for a fraction of time, so S2 does not hear a heartbeat and S2 initiates an election for term 4.

    S3 comes back online right before RequestVote arrives, and S3 votes YES for S2 being a leader for term 4. This is the point when S3 learns the current term is 4, I think this is the core of your question.

    At this point we are still at log position 11, since no commands from a client arrived.

    State of the system:

    Now this is the sequence of next events:

    The last sequence of events leads to the log config described in the question.