LMAX Disruptor is generally implemented using the following approach:
As in this example, Replicator is responsible for replicating the input events\commands to the slave nodes. Replicating across a set of nodes requires us to apply consensus algorithms, in case we want the system to available in the presence of network failures, master failure and slave failures.
I was thinking of applying RAFT consensus algorithm to this problem. One observation is that: "RAFT requires that the input event\commands are stored to the disk (durable storage) during replication" (Reference this link)
This observation essentially means that we cannot perform a in-memory replication. Hence it appears that we might have to combine the functionality of replicator and journaller to be able to successfully apply RAFT algorithm to LMAX.
There are two options to do this:
Option 1: Using the replicated log as input event queue
I think a disadvantage of this option has got to do with fact that we do an additional data copy step (receiver to event queue instead of the ring buffer).
Option 2: Use Replicator to push input events\commands to slave's input log file
I was wondering if there is any other solution to design of Replicator? What are the different design options that people have employed for replicators? Particularly any design that can support in-memory replication?
Your intuition is correct about folding the replication and journalling into the Raft component. But, the Raft protocol dictates exactly when things need to be stored on disk.
Here are two different ways to look at it.
I'm assuming there is no is hefty computation, such as a transaction processing, before the replication because you don't have any in your diagrams.
I, personally, would do the first because it separates concerns into different processes. If I was implementing Raft for myself I would take the first half of the second scenario and put it in its own process.
In which Raft is implement by an external process.
The replication component outsources to an external Raft process the business of replication. After some time, Raft responds to the replication component that it is, in fact, replicated. The replication component updates the items in the ring buffer, and moves its published cursor forward. The business logic sees the published cursor (via waitFor
) and consumes the freshly replicated data.
In this scenario, the replication component probably has a lot of inflight events, so it's read cursor is far ahead of the cursor it publishes to the business logic.
There is no need for a journalling component in this scenario because the external raft system does the journalling for you.
Note, the replication may be the slowest component of the system!
In which raft is implemented in the same process as the "Real Business Logic."
In terms of Raft, replication is the business logic. Actually, you have multiple levels of business logic, or equivalently, multiple stages of business logic.
I'm going to use two input disruptors and two output disruptors for this to emphasize the separate business logic. You can combine, split, or rearrange to your heart's content. Or your profiler's content.
The first stage, as I mentioned, is Raft replication. Client events go into the Replication Input Disruptor. The Raft logic picks it up, perhaps in batches, and sends out to the Followers on the Replication Output Disruptor. All Raft messages also go into the Replication Input Disruptor. The Raft logic also picks these up and sends the appropriate responses to the appropriate Followers/Master on the Replication Output Disruptor).
A journaller component hangs off the Input Ring Buffer; it only has to handle certain types of messages as dictated by Raft. This will likely be the slowest part of the system.
When the data is considered replicated, it is moved to the second stage, via the "Real Business Logic" Input Disruptor. There it is processed, sent to the Client Outbound Disruptor, and then sent to one of your millions of happy paying customers.