I am trying to achieve a scenario whr 5 curl requests would spin 5 pods. I have being playing around with Knative concurrency setting for autoscaling. Some observation I realise is that when I set
autoscaling.knative.dev/target: "1"
After curling a request, it would spin up 2 pod. With 5 requests, Knative would spin about 9 - 10 pods. I try setting hard limit concurrency too.
spec:
containerConcurrency: 1
Same behavior as well.
However when I set the limit to 2, 5 Requests would spin about 4 pods which is not what I desire as well.
How can i achieve 1 request per pod for my application?
By default, Knative uses a target utilization of 70%, diluting the value it uses for concurrency targets by 70%. That means, the system targets 70% capacity for the current load, see https://knative.dev/docs/serving/autoscaling/concurrency/#target-utilization.
You might want to try setting the utilization to 100% to run completely hot.