cqrsevent-sourcingapache-kafkadddd

Using Kafka as a (CQRS) Eventstore. Good idea?


Although I've come across Kafka before, I just recently realized Kafka may perhaps be used as (the basis of) a CQRS, eventstore.

One of the main points that Kafka supports:

Admittedly I'm not 100% versed in CQRS / Event sourcing but this seems pretty close to what an events tore should be. The funny thing is: I really can't find that much about Kafka being used as an event store, so perhaps I am missing something.

So, is anything missing from Kafka for it to be a good event store? Would it work? Using its production? Interested in insight, links, etc?

Basically, the state of the system is saved based on the transactions/events the system has ever received, instead of just saving the current state/snapshot of the system which is what is usually done. (Think of it as a General Ledger in Accounting: all transactions ultimately add up to the final state) This allows all kinds of cool things, but just read up on the links provided.


Solution

  • Kafka is meant to be a messaging system which has many similarities to an event store however to quote their intro:

    The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time. For example if the retention is set for two days, then for the two days after a message is published it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so retaining lots of data is not a problem.

    So while messages can potentially be retained indefinitely, the expectation is that they will be deleted. This doesn't mean you can't use this as an event store, but it may be better to use something else. Take a look at EventStoreDB for an alternative.

    UPDATE

    Kafka documentation:

    Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.

    UPDATE 2

    One concern with using Kafka for event sourcing is the number of required topics. Typically in event sourcing, there is a stream (topic) of events per entity (such as user, product, etc). This way, the current state of an entity can be reconstituted by re-applying all events in the stream. Each Kafka topic consists of one or more partitions and each partition is stored as a directory on the file system. There will also be pressure from ZooKeeper as the number of znodes increases.