rpaginationhttr2

Correct next_req() to paginate an API request using httr2?


I want to iteratively call a paginated API using httr2's req_perform_iterative function, where the response provides a next url to request.

However, I cannot seem to correctly form the next_req() argument or use the iteration helpers, such as iterate_with_cursor(), and the examples in the documentation are light. In my case I need to use the next url rather than offset the page number as that is the pagination system of the API I am calling.

Could someone please help me form a correct next_req() function?

We can use the rick and morty api as an example:

library(httr2)

# request a single page
req <- request("https://rickandmortyapi.com/api/character?page=1") |>
  req_perform() |>
  resp_body_json()

# return the url for the next page
next_url <- req$info$"next"

How do I turn this into a working req_perform_iterative() function that will return multiple pages? Thanks!


Solution

  • While you could use iterate_with_cursor() here, it would fit better if next in response body would be just a next page number.

    For this particular example (complete URL for next page in response body) it's probably easier to just build a new iteration helper, we can take one of the existing ones as a template:

    library(httr2)
    
    # existing iteration helper, follows url found in the Link header:
    iterate_with_link_url()
    #> function (resp, req) 
    #> {
    #>     url <- resp_link_url(resp, rel)
    #>     if (!is.null(url)) {
    #>         req %>% req_url(url)
    #>     }
    #> }
    
    # custom helper based on iterate_with_link_url(),
    # follow next url from response body
    iterate_with_body_info_next <- function(resp, req) {
      url <- resp_body_json(resp)$info$`next`
      if (!is.null(url)) {
        req %>% req_url(url)
      }
    }
    
    resps <- 
      request("https://rickandmortyapi.com/api/character") |> 
      req_perform_iterative(
        next_req = iterate_with_body_info_next,
        max_reqs = 3
    )
    resps
    #> [[1]]
    #> <httr2_response>
    #> GET https://rickandmortyapi.com/api/character
    #> Status: 200 OK
    #> Content-Type: application/json
    #> Body: In memory (19496 bytes)
    #> 
    #> [[2]]
    #> <httr2_response>
    #> GET https://rickandmortyapi.com/api/character?page=2
    #> Status: 200 OK
    #> Content-Type: application/json
    #> Body: In memory (10380 bytes)
    #> 
    #> [[3]]
    #> <httr2_response>
    #> GET https://rickandmortyapi.com/api/character?page=3
    #> Status: 200 OK
    #> Content-Type: application/json
    #> Body: In memory (9723 bytes)
    
    # info object from the 1st request body
    str(resp_body_json(resps[[1]])$info)
    #> List of 4
    #>  $ count: int 826
    #>  $ pages: int 42
    #>  $ next : chr "https://rickandmortyapi.com/api/character?page=2"
    #>  $ prev : NULL
    

    Though iterate_with_offset() would also work just fine, define url parameter that would be incremented and a function that extracts total number of pages from the first response and you should end up with something like this:

    request("https://rickandmortyapi.com/api/character") |> 
      req_perform_iterative(
        next_req = iterate_with_offset(
          param_name = "page",
          resp_pages = \(resp) resp_body_json(resp)$info$pages
        ),
        max_reqs = 3
      )
    

    Created on 2024-10-18 with reprex v2.1.1