apache-flinksequencekafka-topic

Apache Flink with multiple Kafka sources. Ensure one topic is fully read before consuming data on the other topic


Working with Kafka Streams by creating a GlobalKTable I know per definition that the table will be fully populated before the streaming of other sources will start.

I'm looking for a similar functionality in Apache Flink. Topic one holds configuration data which is almost static. I want Flink to fully consume this topic before even starting to read from topic two. Topic one contains ~5 Mio records with a total size of around 600MB

Is there a way to achieve this or would I need to buffer the data from topic two until I have matching data from topic one?


Solution

  • As described in another thread (Provide initial state to Apache Flink Application) - the situation was fixed using a separate init deployment which consumes the topic and writes the data to the Flink state.

    Then a savepoint is created before starting the proper application from this savepoint with the data