apache-flink

Elastic Scaling in Apache Flink and minimum parallelism


Is there any possibility to define min-parallelism when the pipeline keeps silent (no payload) for a long time to minimize the latency when the payload appears again?

For now when there is no data coming into the pipeline (no payload going to source), and the pipeline has around 0 CPU load, the Adaptive Scaling drops the parallelism of all operators to 1. However, my intention would be to have the min-parallelism at 8, so the latency will be adequate when the payload appears again.

I am trying Elastic Scaling in Adaptive Mode for our Flink pipelines with ApacheFlink 1.18 and flink-kubernetes-operator 1.9. My scaling settings of the FlinkDeployment are:

spec:
  flinkConfiguration:
    cluster.evenly-spread-out-slots: 'true'
    job.autoscaler.catch-up.duration: 1m
    job.autoscaler.enabled: 'true'
    job.autoscaler.metrics.window: 1m
    job.autoscaler.restart.time: 2m
    job.autoscaler.scaling.enabled: 'true'
    job.autoscaler.stabilization.interval: 1m
    job.autoscaler.target.utilization: '0.6'
    job.autoscaler.target.utilization.boundary: '0.2'
    jobmanager.scheduler: adaptive
    parallelism.default: '8'
    taskmanager.numberOfTaskSlots: '8'

P.S. Our use case is that our pipelines have no payload half of the day, but then a lot of data comes in, and our operators inside the pipelines are very CPU intensive, which requires fast reaction to increase the parallelism of the pipelines.


Solution

  • According to the flink-kubernetes-operator documentation (See: https://nightlies.apache.org/flink/flink-kubernetes-operator-docs-main/docs/operations/configuration/)

    With job.autoscaler.vertex.min-parallelism you can set the minimum parallelism of your pipeline.