I've created code to call the Page Speed Insights API.
The build_cwv_data is an async coroutine that calls the api and retrieves and processes json data for a particular URL.
According to the documentation the API has a limit of 400 requests per 100 seconds. And interestingly it is at around the 100 second mark that the API starts returning a 409 error status code (quota exceeded)
My code is doing approximately 775 calls in 100 seconds.
I don't understand how it is making so many calls in that time period as I have added sleep delays to try to slow it down.
Firstly, why is it still so fast? What can I do to slow it down?
async def retrieve_cwv_data(urls_list):
site_id = 10234
tasks = []
rate_limit = 2 # maximum number of API calls per second
interval = 1 / rate_limit # interval between API calls in seconds
count = 0
start_time = time.monotonic() # initial start time
for url in urls_list:
task1 = asyncio.ensure_future(build_cwv_data(site_id, url, 'mobile', psi_key))
task2 = asyncio.ensure_future(build_cwv_data(site_id, url, 'desktop', psi_key))
tasks.append(task1)
tasks.append(task2)
count += 2
if count >= rate_limit * 2:
elapsed_time = time.monotonic() - start_time
if elapsed_time < interval:
# introduce delay to stay within the rate limit
await asyncio.sleep(interval - elapsed_time)
# reset count and start time for the next second
count = 0
start_time = time.monotonic()
results = await asyncio.gather(*tasks)
tmp_list = []
for result in results:
tmp_list.append(result)
return tmp_list```
You made the pause while creating the tasks - but they only start to get executed when you pass the control from your code to the asyncio loop, in the call to asyncio.gather
: at that point all your taks are actually executed as fast as possible (each next task starts as soon as the previous one, internally, sends a request and awaits for its response). In other words: you are making the calls as fast as your computer can go - the delay that makes the fault happen around the 100th second is because you have delays before the tasks actually start to making requests.
Always keep in mind that asyncio code is just regular, serialized code, running in a single thread with explicit pause points: no code out of what you are looking at ever runs unless you reach one of those pause-points (or, of course, delegate something to another thread or process) - and the pause points are either the await
keyword, or async for
and async with
.
You have to change your code so that it can work - one way to do that is a semaphore, that could hold the limit of concurrent ongoing requests to 400 and then add some pause.
from asyncio import Semaphore
from time import time
call_semaphore = None
timeout = 100
...
async def makecall(*args):
async with call_semaphore:
start = time()
await build_cwv_data(*args)
elapsed = time() - start
# ensure the semaphore slot usage for this task is just fred observing the rate limit:
await time.sleep(max(0, timeout - elapsed))
async def retrieve_cwv_data(urls_list):
global call_semaphore
call_semaphore = Semaphore(400)
site_id = 10234
tasks = []
for url in urls_list:
task1 = asyncio.ensure_future(makecall(site_id, url, 'mobile', psi_key))
task2 = asyncio.ensure_future(makecall(site_id, url, 'desktop', psi_key))
tasks.append(task1)
tasks.append(task2)
results = await asyncio.gather(*tasks)
tmp_list = []
for result in results:
tmp_list.append(result)
return tmp_list