I have started exploring Knative recently and I am trying to understand how concurrency and autoscaling work. I understand that (target) concurrency refers to the number of requests that can be scheduled to a single Pod for a given revision at the same time.
However, I am not sure I understand which is the impact of having a value of concurrency greater than 1. What happens when N requests are scheduled to the same Pod? Will they be processed one at a time in a FIFO order? Will multiple threads be spawned to serve them in parallel (possibly competing for CPU resources)?
I am tempted to set concurrency=1 and rely on autoscaling to handle multiple requests through multiple Pods, but I guess this is not the best thing to do.
Thanks in advance
containerConcurrency
is an argument to the Knative infrastructure indicating how many requests your container can handle at once.
In AWS Lambda and some other Function-as-a-Service offerings, each instance will only ever process a single request. This can be simpler to manage, but some languages (Java and Golang, for example) easily support multiple requests concurrently using threaded request models. Platforms like Cloud Foundry and App Engine support this larger concurrency, but not the "function" model of code transformation.
Knative is somewhere between these two; since you can bring your own container, you can build an application container which is single-threaded like Lambda expects and set containerConcurrency
to 1, or you can create a multi-threaded container and set containerConcurrency
higher.