[SOLVED] How does CockroachDB handle partitions between the leaseholder and Raft leader of a range?

How does CockroachDB handle partitions between the leaseholder and Raft leader of a range?

While CockroachDB attempts to ensure that the leaseholder and Raft leader are the same node, this is not enforced (well aside from the introduction of leader leases). In the event that the Raft leader and leaseholder of a range are not same node, what happens if there is a partition between the Raft leader and leaseholder?

Let's say a leaseholder forwards a write request to the Raft leader. The Raft leader appends the write to its Raft log and then forwards the write to its followers. Now let's say it achieves quorum for the write, but that quorum does NOT include the leaseholder. The write is committed to disk on the Raft leader and the followers in quorum, but the leaseholder, not being in the quorum, does not commit this write.

For future read requests, which would be executed by the leaseholder, wouldn't it serve stale reads?

Solution

We use latches to sequence concurrent, conflicting requests on the leaseholder. A write request will acquire write latches, which will block any read requests with higher timestamps than the write. That's because these reads need to see the MVCC value written by the write. Latches won't be released until the write has been committed to the leaseholder's log and then applied to its state machine. So, in the example in the thread, all future read requests will wait on these latches, which means we won't serve a stale read.