machine-learningapache-flinkstreamingdependency-managementpyflink

Dependency management and execution environment in apache flink


we are evaluating apache flink for deploying streaming ml applications.

How is dependency management handled in apache flink and especially the execution environment?

Imagine python tasks with different dependencies should be submitted to the flink cluster.

We only see that flink Task Manager can handle dependency management with python virtual environments. When we have different dependencies for every task should we deploy a new task manager for every task?

Coming from a container setup, we could deploy every task inside a separate docker image.

How is this usually handled when using apache flink? We do not see that flink is great at handling a huge number of tasks that need their specific dependencies, but would like to make use of the streaming processor.


Solution

  • The solution is to use a more modern Python-native alternative: