I'm still struggling with what must be basic (and resolved) issues related to CQRS style architecture:
How do we implement business rules that rely on a set of Aggregate Roots?
Take, as an example, a booking application. It may enable you to book tickets for a concert, seats for a movie or a table at a restaurant. In all cases, there's only going to be a limited number of 'items' for sale.
Let's imagine that the event or place is very popular. When sales open for a new event or time slot, reservations start to arrive very quickly - perhaps many per second.
On the query side we can scale massively, and reservations are put on a queue to be handled asynchronously by an autonomous component. At first, when we pull off Reservation Commands from the queue we will accept them, but at a certain time we will have to start rejecting the rest.
How do we know when we reach the limit?
For each Reservation Command we would have to query some sort of store to figure out if we can accommodate the request. This means that we will need to know how many reservations we have already received at that time.
However, if the Domain Store is a non-relational data store such as e.g. Windows Azure Table Storage, we can't very well do a SELECT COUNT(*) FROM ...
One option would be to keep a separate Aggregate Root that simply keeps track of the current count, like this:
The second Aggregate Root would be a denormalized aggregation of the first one, but when the underlying data store doesn't support transactions, then it's very likely that these can get out of sync in high-volume scenarios (which is what we are trying to address in the first place).
One possible solution is to serialize handling of the Reservation Commands so that only one at a time is handled, but this goes against our goals of scalability (and redundancy).
Such scenarios remind me of standard "out of stock" scenarios, but the difference is that we can't very well put the reservation on back order. Once an event is sold out, it's sold out, so I can't see what a compensating action would be.
How do we handle such scenarios?
After thinking about this for some time it finally dawned on me that the underlying problem is less related to CQRS than it is to the non-trasactional nature of disparate REST services.
Really it boils down to this problem: if you need to update several resources, how do you ensure consistency if the second write operation fails?
Let's imagine that we want to write updates to Resource A and Resource B in sequence.
The first write operation can't easily be rolled back in the face of an exception, so what can we do? Catching and suppressing the exception to perform a compensating action against Resource A is not a viable option. First of all it's complex to implement, but secondly it's not safe: what happens if the first exception happened because of a failed network connection? In that scenario, we can't write a compensating action against Resource A either.
The key lies in explicit idempotency. While Windows Azure Queues don't guarantee exactly once semantics, they do guarantee at least once semantics. This means that in the face of intermittent exceptions, the message will later be replayed.
In the previous scenario, this is what happens then:
When all write operations are idempotent, eventual consistency can be achieved with message replays.