architectureevent-sourcingdisruptor-patternlmax

LMAX Architecture - Growth of data


Consider the following scenario from LMAX Architecture description from Martin Fowler:

I'll use a simple non-LMAX example to illustrate. Imagine you are making an order for jelly beans by credit card. <...>

In the LMAX architecture, you would split this operation into two. The first operation would capture the order information and finish by outputting an event (credit card validation requested) to the credit card company. The Business Logic Processor would then carry on processing events for other customers until it received a credit-card-validated event in its input event stream. On processing that event it would carry out the confirmation tasks for that order.

So the order is held in-memory until the result of payment processing is received.

Now let us assume instead of credit card processing step, we have step that takes much more time, for example: we need to perform an inventory check, where somebody has to physically verify we have the particular flavor of jelly bean that has been ordered. This might take an hour.

If this is the case, will not lead to a growth of the data held in-memory because potentially a lot of orders will be awaiting the inventory status updated event?

Possibly in such a scenario, we need to remove the order from memory and include it as part of the output event, an external system (inventory) is responsible for generating another input event which includes the order detail.

The problem I see with this approach, we cannot include inventory as part of the business logic processor.

Thoughts about how do we address this?


Solution

  • Working orders in a financial exchange can stay around for days, or even months, as part of the working set. For example, waiting for a futures contract to expire. Customer accounts are also similar. By "working set" I mean deals/orders/sales etc. that are currently active. Once a deal is complete it is then part of historical data.

    Memory systems are now so large, i.e. hundreds of GBs in a single server, that the working set of almost any business easily fits into memory. Also available memory size is increasing at a rate much faster than any large business is growing.

    The scenario you describe is not really an issue. What can become an issue is when you need to hold all historical data for which a traditional database or file-based system is more suitable.

    A simple exercise is to calculate the memory required for active entities, or working set, and then compare that to what is available in modern servers. It is possible to keep 100s of millions of active entities around in memory.