parallel-processingstreamingapache-flinkdata-stream

Flink streaming: Do the events get distributed to each task slots separately according to their keys?


So for example if I have events in order with key A and events in order with key B and a parallelism of 2. Do all the events with key A go to one task slot and key B ones go to the other task slot?

What happens if i only get events in order with key A. Do they also get distributed to the two task slots. Does that mean i lose the order in which they come?


Solution

  • No, that's not exactly how it works.

    What happens is that each key is mapped onto a key group, where the total number key groups is determined by the cluster's maximum parallelism (a configuration setting). And then key groups are mapped onto task slots. If there are two keys and two slots, it's entirely possible that both keys will be assigned to the same slot.

    The key group for key is:

    MathUtils.murmurHash(key.hashCode()) % maxParallelism
    

    And the slot for a key group is:

    keyGroup * actualParallelism / maxParallelism
    

    As for maintaining ordering, see https://stackoverflow.com/a/69094404/2000823 and https://stackoverflow.com/a/69757412/2000823.