I created a generator to perform pagination on an api:
def page_helper(req, timeout=5, page=1, **kwargs):
print(f"Page {page}", end="\r")
try:
response = req(params={**kwargs, "page": page})
response = response.json()
except Exception as e:
status = response.status_code
if status == "429":
print(f"Rate limited. Waiting {timeout} seconds.")
time.sleep(timeout)
yield from page_helper(req, page=page, **kwargs)
else:
raise e
else:
if len(response) == kwargs["limit"]:
yield from page_helper(req, page=page + 1, **kwargs)
yield response
Later I use this generator somewhere like this
batches = page_helper(<some_request>, limit=100)
# get insert and updates per batch
for i, batch in enumerate(batches):
print(f"Batch {i + 1}", end="\r")
insert_batch = []
update_batch = []
# ... process batch
I want it to fetch each page as a batch and process it before it fetches the next batch. Fetching the batches works perfectly, but it keep on fetches pages without processing in between.
I tried to check the generator by calling next, and I expect it to only return one batch. However it starts the full iterations immediately:
next(batches) # --> Performs full iteration
next(batches)
next(batches)
next(batches)
Is there something wring with my generator function?
Why is your generator recursive to begin with? Just use a loop to do your pagination. If I've understood your intent correctly, this should do what you want:
def get_all_pages(request, retry_interval_seconds=5, start_page=1, **kwargs)
current_page = start_page
while True:
try:
response = request(params={**kwargs, "page": current_page})
data = response.json()
except Exception as error:
if response.status_code == "429":
time.sleep(retry_interval_seconds)
continue # Try again without incrementing current_page
else:
raise error
yield data
if len(data) != kwargs["limit"]:
break
current_page += 1