[SOLVED] Single request spinning 2 Knative instances when setting concurrency limit to 1

Single request spinning 2 Knative instances when setting concurrency limit to 1

I am trying to achieve a scenario whr 5 curl requests would spin 5 pods. I have being playing around with Knative concurrency setting for autoscaling. Some observation I realise is that when I set

autoscaling.knative.dev/target: "1"

After curling a request, it would spin up 2 pod. With 5 requests, Knative would spin about 9 - 10 pods. I try setting hard limit concurrency too.

spec:
  containerConcurrency: 1

Same behavior as well.

However when I set the limit to 2, 5 Requests would spin about 4 pods which is not what I desire as well.

How can i achieve 1 request per pod for my application?

Solution

By default, Knative uses a target utilization of 70%, diluting the value it uses for concurrency targets by 70%. That means, the system targets 70% capacity for the current load, see https://knative.dev/docs/serving/autoscaling/concurrency/#target-utilization.

You might want to try setting the utilization to 100% to run completely hot.