I want to iteratively call a paginated API using httr2's req_perform_iterative function, where the response provides a next url to request.
However, I cannot seem to correctly form the next_req()
argument or use the iteration helpers, such as iterate_with_cursor()
, and the examples in the documentation are light. In my case I need to use the next url rather than offset the page number as that is the pagination system of the API I am calling.
Could someone please help me form a correct next_req()
function?
We can use the rick and morty api as an example:
library(httr2)
# request a single page
req <- request("https://rickandmortyapi.com/api/character?page=1") |>
req_perform() |>
resp_body_json()
# return the url for the next page
next_url <- req$info$"next"
How do I turn this into a working req_perform_iterative()
function that will return multiple pages? Thanks!
While you could use iterate_with_cursor()
here, it would fit better if next
in response body would be just a next page number.
For this particular example (complete URL for next page in response body) it's probably easier to just build a new iteration helper, we can take one of the existing ones as a template:
library(httr2)
# existing iteration helper, follows url found in the Link header:
iterate_with_link_url()
#> function (resp, req)
#> {
#> url <- resp_link_url(resp, rel)
#> if (!is.null(url)) {
#> req %>% req_url(url)
#> }
#> }
# custom helper based on iterate_with_link_url(),
# follow next url from response body
iterate_with_body_info_next <- function(resp, req) {
url <- resp_body_json(resp)$info$`next`
if (!is.null(url)) {
req %>% req_url(url)
}
}
resps <-
request("https://rickandmortyapi.com/api/character") |>
req_perform_iterative(
next_req = iterate_with_body_info_next,
max_reqs = 3
)
resps
#> [[1]]
#> <httr2_response>
#> GET https://rickandmortyapi.com/api/character
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (19496 bytes)
#>
#> [[2]]
#> <httr2_response>
#> GET https://rickandmortyapi.com/api/character?page=2
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (10380 bytes)
#>
#> [[3]]
#> <httr2_response>
#> GET https://rickandmortyapi.com/api/character?page=3
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (9723 bytes)
# info object from the 1st request body
str(resp_body_json(resps[[1]])$info)
#> List of 4
#> $ count: int 826
#> $ pages: int 42
#> $ next : chr "https://rickandmortyapi.com/api/character?page=2"
#> $ prev : NULL
Though iterate_with_offset()
would also work just fine, define url parameter that would be incremented and a function that extracts total number of pages from the first response and you should end up with something like this:
request("https://rickandmortyapi.com/api/character") |>
req_perform_iterative(
next_req = iterate_with_offset(
param_name = "page",
resp_pages = \(resp) resp_body_json(resp)$info$pages
),
max_reqs = 3
)
Created on 2024-10-18 with reprex v2.1.1