pythoniteratorgeneratoryield

Python - Generator not working with next method


I created a generator to perform pagination on an api:

def page_helper(req, timeout=5, page=1, **kwargs):
    print(f"Page {page}", end="\r")
    try:
        response = req(params={**kwargs, "page": page})
        response = response.json()

    except Exception as e:
        status = response.status_code

        if status == "429":
            print(f"Rate limited. Waiting {timeout} seconds.")
            time.sleep(timeout)
            yield from page_helper(req, page=page, **kwargs)
        else:
            raise e

    else:
        if len(response) == kwargs["limit"]:
            yield from page_helper(req, page=page + 1, **kwargs)

        yield response

Later I use this generator somewhere like this

batches = page_helper(<some_request>, limit=100)


# get insert and updates per batch
for i, batch in enumerate(batches):
    print(f"Batch {i + 1}", end="\r")
    insert_batch = []
    update_batch = []

    # ... process batch

I want it to fetch each page as a batch and process it before it fetches the next batch. Fetching the batches works perfectly, but it keep on fetches pages without processing in between.

I tried to check the generator by calling next, and I expect it to only return one batch. However it starts the full iterations immediately:

next(batches) # --> Performs full iteration 
next(batches)
next(batches)
next(batches)

Is there something wring with my generator function?


Solution

  • Why is your generator recursive to begin with? Just use a loop to do your pagination. If I've understood your intent correctly, this should do what you want:

    def get_all_pages(request, retry_interval_seconds=5, start_page=1, **kwargs)
        current_page = start_page
        while True:
            try:
                response = request(params={**kwargs, "page": current_page})
                data = response.json()
            except Exception as error:
                if response.status_code == "429":
                    time.sleep(retry_interval_seconds)
                    continue # Try again without incrementing current_page
                else:
                    raise error
            yield data
            if len(data) != kwargs["limit"]:
                break
            current_page += 1