I got a little script to test this because my app is basically not usable with super slow API requests. I reduced my FastAPI app to basically this:
@app.post("/")
async def handle_jsonrpc(request: Request, background_tasks: BackgroundTasks):
# This will start a task to hit "example.com" 100 times
asyncio.create_task(latency_test())
return JSONResponse("ok", status_code=200)
I get this result on Google Cloud Run:
2025-05-15 18:37:33 INFO:httpx:HTTP Request: GET https://www.example.com "HTTP/1.1 200 OK"
2025-05-15 18:37:33 INFO:main:Request 095: 0.0222 seconds (status 200)
2025-05-15 18:37:32 INFO:main:Request 084: 20.1998 seconds (status 200)
2025-05-15 18:37:32 INFO:main:Request 088: 12.0986 seconds (status 200)
2025-05-15 18:37:39 INFO:main:Request 100: 5.3776 seconds (status 200)
2025-05-15 18:37:39 INFO:main:Request 081: 39.6005 seconds (status 200)
2025-05-15 18:37:39 INFO:main:Request 085: 24.9007 seconds (status 200)
On Google Cloud: Avg latency per request: 13.4155 seconds.
On my local machine: Avg latency per request: 0.0245 seconds (547x faster)
Note: The latency is always fast in the beginning, but then drops of all of a sudden. Sometimes on request #20, sometimes on request #80. A couple time I made it through all 100 requests at speeds that matches my local machine. Even if I run the latency_test 2 times back to back, both start out fast and then slow down at a seemingly random spot.
I followed the instructions to create a static IP and use that to create for all egress traffic. https://cloud.google.com/run/docs/configuring/networking-best-practices#performance
Verified by creating an endpoint by calling response = await client.get("https://ifconfig.me/ip")
But the result is the same. I upped memory an vCPUs and the result is the same (as expected, because making 100 requests should be fast on a 486)
I suspect that I'm missing something basic, but I'm a bit stumped as to what that could be.
Since cloud run is a "serverless" platform, you're not guaranteed to have access to CPU/network when you're not serving an HTTP request. Since you're creating the task to run after the HTTP request finishes, this is most likely why the first few requests are fast, and the rest duper slow. There is a setting called "Instance billing" that allows CPU to always be allocated, which should allow the bakground task to keep running (together with a minimum instance count of 1).
Keep in mind that CPU always on will make your service more expensive, if you don't have a lot of traffic.