javastreamapache-flink

Flink KeyedProcessFunction Creation Count


I am new to Flink and trying to understand if the number of created KeyedProcessFunction instances change depending on where I created the function.

MyProcessFunction myFunction = new MyProcessFunction()
events.keyBy(value -> value.getKey())
      .process(myFunction)

What I understand from the documentation is that if I create it like this instead

events.keyBy(value -> value.getKey())
      .process(new MyProcessFunction())

it will create a MyProcessFunction for every element in the stream. I wrote a test code though and it created only a single one. Am I missing something? Also if it's creating more than one, is it creating one for every element or one for every KeyedStream created in keyBy method?


Solution

  • The two variant declarations you have (variable vs. inline declaration) should resolve to the same job graph, so there shouldn't be any difference there outside of syntactic preferences.

    Also if it's creating more than one, is it creating one for every element or one for every KeyedStream created in keyBy method?

    Given that you are keying the stream, there will be a separate instance of your MyProcessFunction created for each key within your stream, such that elements with the same key will flow through the same instance of the operator (allowing state to be shared between them).