apache-flinkapache-beamamazon-kinesis-analytics

Flink-Kafka Flink job reading kafka records during startup and failing to start on AWS-KDA


Running a Flink-Beam job on KDA (kakfa --> flink(beam) --> ElasticSearch) the simple job wont start on KDA and goes to infinite loop. The AWS KDA Support replied saying the Job reads records during startup which is the cause of failure.

The dockerized version of the app runs smooth with 3 taskmanagers in kubernetes but not on KDA. As KDA has 2 minute timeout to start a job.

By my understanding Flink starts reading records once the job starts, how do i reduce startup lesser than 2 minutes, as the job is very basic reading records from kafka and store to ES.


Solution

  • I resolved the issue, basically Beam uses direct runner as default.

    it is important to set --runner=FlinkRunner to start your job as a flink job.

    otherwise the job is in infinite loop of reading from kafka topic.