I have a working Flink job built on Flink Data Stream. I want to REWRITE the entire job based on the Flink stateful functions 3.1.
I have read the doc and got some ideas. My plans are:
Kafka Ingress
(https://nightlies.apache.org/flink/flink-statefun-docs-release-3.0/docs/io-module/apache-kafka/) handles itMinIO
is not in the list of default Flink I/O Connectors
, therefore I need to write my custom Flink I/O Connector
for MinIO
myself, according to https://nightlies.apache.org/flink/flink-statefun-docs-release-3.0/docs/io-module/flink-connectors/Embedded module
because it prevents scaling. Auto scaling is the key reason why I want to migrate to Flink stateful functions
I don't feel confident with my plan. Is there anything wrong with my understandings/plan?
Are there any best practice I should refer to?
I want to get rid of windows during the rewrite, but I don't know how
Background: Use KeyedProcessFunctions Rather than Windows to Assemble Related Events
With the DataStream API, windows are not a good building block for assembling together related events. The problem is that windows begin and end at times that are aligned to the clock, rather than being aligned to the events. So even if two related events are only a few milliseconds apart they might be assigned to different windows.
In general, it's more straightforward to implement this sort of use case with keyed process functions, and use timers as needed to deal with missing or late events.
Doing this with the Statefun API
You can use the same pattern mentioned above. The function id will play the same role as the key, and you can use a delayed message instead of a timer: