message-queue messaging masstransit outbox-pattern

Transactional outbox confusion

It is common that a service may need to combine database writes with publishing events and/or sending commands.

It appears the only way to guarantee this is via the use of a Transactional Outbox where service that sends the message to first store the message in the database as part of the transaction that updates the business entities. A separate process then sends the messages to the message broker.

I'm conflicted as for every article about needing a Transactional Outbox there's the equivalent article about using databases as queue being an anti-pattern
At the point that you are persisting messages in the database, why is a broker still needed? Why not have consumers polling that same table (for example using mass-transit's sql transport)
How do you deal with the reduced throughput given that all messages must now be stored and polled for?

Solution

There is definitely some confusion around the concept of an outbox, but there are enough videos and blog posts already explaining why they are "necessary."

Necessary is in quotes because proper system design could alleviate the need, but I digress.

A transactional outbox is not a queue, it's a staging area for messages before they are delivered to the broker. Technically, yes, it's a FIFO queue in a table so that messages are delivered in the order they were produced, but it isn't dealing with all the locking and ack/nack behavior that message broker provides.

This approach is due to the "I changed some DBs, I don't want messages produced if those DBs fail to commit" application style.

This could be avoided by ensuring communications are two-party (HTTP/RabbitMQ, RabbitMQ/DB, etc.) and operations are idempotent. But many bring HTTP/DB/RMQ into the same conversation, thus the "need" for an outbox became a thing. And idempotent handling by "message id" instead of the actual data is "easy", but requires that all events produced have consistent identifiers in the presence of failures/retries. Again, outbox to the rescue.

Database as a Queue

A broker is just an API/abstraction on top of a database, doesn't matter if it is a RAFT-esque replicated high availability queue or JET Blue. They all store data and lock messages.

In fact, MassTransit has its own database transport that provides full broker behavior on top of PostgreSQL and SQL Server. Is it wrong? No. Is it as high-performance as a dedicated RabbitMQ style broker? Probably not. Is it enough for most applications? It depends, but it might be enough.

Polling throughput

With the outbox, consumers are not polling the outbox, a dedicated outbox delivery service is reading those messages and sending them out to the broker. And yes, it does read the table, and then deliver those messages (transactionally, in MT's case) and it isn't as fast as going directly to the broker, but transaction-based processing is inherently slower the more parties engaged in the conversation.

It's all about tradeoffs for the sake of application consistency.