I am pulling data using an API, Python, and the requests package. I want to pull all the data, but have only been able to pull 4,000 rows. How do I pull all of the data? The number of pages is not present in the response. I don't know how many rows are in the data, but it's more than 4,000.
Here is the working code that can pull 4,000 rows, though some of the details need to remain private:
headers = {
'accept': '*/*',
'Authorization': 'Bearer <generated_token_put_here>',
'Content-Type': 'application/json',
'verify':'/etc/ssl/certs/ca-certificates.crt'
}
data = '{"pageSize": 2000, "pageNumber": 100}' #his is the largest pageSize and pageNumber values that will still return data.
response = requests.post('<api_endpoint_put_here>', headers=headers, verify=True, data=data)
When you don't know the total number of pages, a simple method would be to iterate over the pages and then once you reach a page that has less than the number of expected rows, you know that's the last page or if there is an exact number of rows that fits the last page, then checking the page after the last page will return an empty set of data.
Here's an implementation of that (you may have to adjust this slightly depending on how the API formats their data:
import requests
headers = {
'accept': '*/*',
'Authorization': 'Bearer <generated_token_put_here>',
'Content-Type': 'application/json',
'verify': '/etc/ssl/certs/ca-certificates.crt'
}
page_size = 2000
page_number = 1
all_data = []
while True:
data = f'{{"pageSize": {page_size}, "pageNumber": {page_number}}}'
response = requests.post('<api_endpoint_put_here>', headers=headers, verify=True, data=data)
if response.status_code != 200:
print(f"Error: Received status code {response.status_code}")
break
response_data = response.json() # Adjust if the response is not JSON
if not response_data: # The previous page was the last page and had the same number of rows as pageSize
break
all_data.extend(response_data) # or something like response_data['items'] if data is nested under 'items'
if len(response_data) < page_size: # you've reached the last page
break
page_number += 1
print(f"Total rows pulled: {len(all_data)}")