I am creating a system in which I use kafka as an event store. The problem I am having is not being able to guarantee the message ordering of all the events.
Let's say I have a User entity and a Order entity. Right now I have the topics configured as follows:
user-deleted
user-created
order-deleted
order-created
When consuming these topics from the start (when a new consumer group registers) first the user-deleted topic gets consumed then the user-created etc. The problem with this is that the events over multiple topics do not get consumed chronologically, only within the topic.
Let's say 2 users get created and after this one gets deleted. The result would be one remaing user.
Events:
My system would consume these like:
Which means the result is 2 remaining users which is wrong.
I do set the partition key (with the user id) but this seems only to guarantee order within a topic. How does this problem normally get tackled? I have seen people using topic per entity. Resulting in 2 topics for this example (user and order) but this can still cause issues with related enities.
What you've designed is "request/response topics", and you cannot order between multiple topics this way.
Instead, design "entity topics" or "event topics". This way, ordering will be guaranteed, and you only need one topic per entity. For example,
users
For a key=userId
, you can structure events this way.
userId, {userId: userId, name:X, ...}
userId, {userId: userId, name:Y, ...}
userId, null
Use a compacted topic for an event-store such that all deletes will be tombstoned and dropped from any materialized view.
You could go a step further and create a wrapper record.
userId, {action:CREATE, data:{name:X, ...}} // full-record
userId, {action:UPDATE, data:{name:Y}} // partial record
userId, {action:DELETE} // no data needed
This topic acts as your "event entity topic", but then, you need a stream processor to parse and process these events consistently into the above format, such as null-ing any action:DELETE
, and writing to a compacted topic (perhaps automatically using Kafka Streams KTable)