I have read the explanation written here that one StreamTask is handeling all messages from co-partitioned topics: How do co-partitioning ensure that partition from 2 different topics end up assigned to the same Kafka Stream Task?
The question is, does the StreamTask handle them one-by-one or in parallel ?
We depend on this answer to ensure the consistincy of the state store. Basically we're afraid to end up with a situation that the message with same key from two topics will be processed at the same time, therefore we will end up with two entries in the state store with the same key.
Also, I understood that among the threads assigned to the application, the partition will be assigned to one StreamTask only among the threads. Therefore, handling the partition won't be duplicated among instances, is that correct understanding?
Thanks in advance
The question is, does the StreamTask handle them one-by-one or in parallel ?
Yes, it would be one-by-one. A task is a single "atomic unit" of work, and is executed by a single thread. The thread would read across the different topic interleaved, picking the record with with smallest timestamp across all available partitions (ie, same partition number across all input topics).
Basically we're afraid to end up with a situation that the message with same key from two topics will be processed at the same time, therefore we will end up with two entries in the state store with the same key.
First of all, as it's single threaded, there is no concurrency. But secondly, I wanted to point out that "two entries ... with the same key" is not really possible for a key-value store. If there would be parallel processing, last put
would win (and overwrite the previous put
).
the partition will be assigned to one StreamTask only among the threads
The model would be that Kafka Streams creates task. These tasks are created based on the structure of the topology (eg, tasks are created for sub-topologies) and (sub-topology) input topic partitions. Later, tasks are assigned to StreamThreads for processing.