microservicesdistributed-transactionsevent-driven-design

How to achieve distributed transaction consistency in a microservices architecture without using 2PC?


In a microservice architecture, each service typically manages its own database and executes operations independently. However, when dealing with business processes that span multiple microservices, ensuring data consistency across these services becomes a complex challenge, especially when operations need to happen atomically. Traditional monolithic architectures can rely on ACID transactions, but in a distributed system, this isn't feasible at scale.

The Two-Phase Commit (2PC) protocol is a well-known solution for distributed transactions, but it's widely considered inefficient and impractical in modern high-scale systems due to issues like blocking, high latency, and single points of failure.

I'm dealing with a highly distributed microservice architecture where multiple services need to participate in a transaction-like process (e.g., a payment service interacting with an inventory service and a shipment service). Each service manages its own database, and the databases vary (e.g., NoSQL for some services, SQL for others).

I need to ensure consistency across services, but:

What are the most efficient patterns or techniques to achieve distributed consistency across microservices without resorting to Two-Phase Commit (2PC) and while maintaining high performance and scalability?


Solution

  • Short answer: You can't. Because of the things you so well describe, immediate consistency between two microservices with their own databases is not an option.

    Instead of prevention, think of compensation. You accept inconsistency temporarily, but make sure that in the end, eventually, you'll reach a consistent state again.

    Let's say you have operations in Service A and B, and either both of them have to be executed, or neither of them must be, for the system to be in a consistent state. So you perform the action in Service A. If it's successful, you perform the action in Service B. If that's also successful, all good. If not, you need to perform a compensating action in Service A that reverts the original action.

    There are at least two fundamental approaches to this, an event-driven one (choreography) and a command-driven one (orchestration).

    Personally I'm a big proponent of event-driven architecture, so I suggest you look into everything on the "awesome EDA" list and fundamentally change the way you design systems. I'll also plug my own talk about the topic.

    For the full picture, you might also want to look into things like the "saga pattern" here or here. There are also companies claiming their tools help you implement such flows, such as temporal.io.

    Lastly, a great, fundamental paper on the topic is "Life beyond distributed transactions" by Pat Helland.