javaspring-bootapache-kafkaapache-kafka-streams

Understanding kafka streams partition assignor


I have two topics, one with 3 partitions and one with 48.

Initially i used the default assignor but i got some problems when a consumer(pod in kubernetes) crashed.

What happened was that when the pod came up again it reassigned the partition from the topic with 3 partitions and 0 from the topic with 48.

The two pods that did not crash got assigned 16 and 32 partitions from the topic with 48 partitions.

I've fixed this by using a round robin partition assignor but now i don't feel confident in how the partitions is distributed since i'm using kstream-kstream joins and for that we need to guarantee that the consumers are assigned to the same partition for all the consumer e.g. C1:(t1:p0, t2:p0) C2(t1:p1, t2:p1) etc..

One thing i thought of was that i could rekey the events coming in so they will repartition and then i might be able to guarantee this?

Or maybe i don't understand how the default partitioning work.. im confused


Solution

  • Kafka Streams does not allow to use a custom partition assignor. If you set one yourself, it will be overwritten with the StreamsPartitionAssignor [1]. This is needed to ensure that -- if possible -- partitions are re-assigned to the same consumers (a.k.a. stickiness) during rebalancing. Stickiness is important for Kafka Streams to be able to reuse states stores at consumer side as much as possible. If a partition is not reassigned to the same consumer, state stores used within this consumer need to be recreated from scratch after rebalancing.

    [1] https://github.com/apache/kafka/blob/9bd0d6aa93b901be97adb53f290b262c7cf1f175/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java#L989