google-cloud-platformgoogle-app-enginegoogle-cloud-run

What is the best way to determine the maximum concurrent requests per instance for your App Engine or Cloud Run app?


On Google Cloud App Engine and Google Cloud Run, what is the best way to determine the value for maximum concurrent requests per instance? The default for App Engine max_concurrent_requests is 10 and the default for Cloud Run is 80.

Ideally, you would want as large a value as possible to minimize additional instances from being spun up.

Is there a tool you can use to see how many requests you are averaging per instance and base it on that?

Or alternatively, would it be better to set maximum concurrent requests to the maximum value of 1000 and then manage scaling based solely on CPU usage? On App Engine this would be setting the target_cpu_utilization value.


Solution

  • I personally run the service with my own feeling. Then, I observe the golden metrics (CPU usage, latency, memory) to be sure that 1 instance is strong enough for typical traffic.

    Another metrics very important is the cold start.

    At financial perspective, it's more interesting to have small instance because every addition incur only a small additional resources (and cost). If you have a BIG instance, each scale out increment will cost a lot!

    It's an optimization over constraint, and the best way is to test and experiment.