Let's assume I have stateful Kafka Streams application that consumes data from topic with 3 partitions. At the moment I have 2 instances of the above application running. Let's put it like that: instance1
have partitions part1
and part2
assigned, instance2
has part3
.
So now I want to add the new instance to utilize the parallelization completely.
In my understanding, as soon as I start a new instance, the rebalancing occurs: one of partitions part1
or part2
and corresponding local state stores will be migrated from the existing instance to the newly added instance. In this example, let's imagine that part1
migrates on instance3
.
At the same time, I realize that new instance instance3
will not start processing new data until it restores the local state store from the changelog topic, which may take much time.
During the period from starting the application and until it restores the state store:
part1
is not being processed and stuck until instance3
finishes the start up?instance3
to build the local state store?instance1 - part2
, instance2 - part3
)?Rebalancing has evolved with the recent releases:
from version 2.4.0 with KIP-429
=> part2
and part3
are not stuck and continued to be processed
from version 2.6.0 with KIP-441
=> part1
continues to be processed on instance1
until instance3
rebuilds the state store for part1
and ready to hand over of its processing