pythonpandaspagination

Pagination API Request in Python


I am trying to pull all pages from this specific endpoint in the API, and then convert the .json responses into a Pandas Dataframe. My mechanic is looking at a Key Value in the .json file, which is nested in the "meta" key, under hasMore`. If there is another page of results, it will produce True, if no more pages, it will return False as the values.

I can print the result of has_More and it comes back True, so I know it has found the correct value. However, when I run the function, I get back a blank list with .json or dataframes. Been at this for awhile trying new code, but always get back blank lists.

import requests
import json
import pandas as pd

url = "https://xxxxxxxx.xxxxxxxxx.com/projects/api/v3/tasklists"

def get_all_tasklists(url, pageSize=100, page=1):

    response = requests.get(f'{url}?page={page}&pageSize={pageSize}',
                            headers = {"authorization": xxxxxxxxxxxx}
                            )
    
    meta_data = json.loads(response.text)['meta']
    has_More = meta_data['page']['hasMore']

    timelogs = []
  
    if has_More == True:
        x = pd.json_normalize(json.loads(response.text))
        timelogs.extend(x)
        page += 1
    else:
        print(timelogs)
    


get_all_tasklists(url, pageSize=100, page=1)

Solution

  • A couple problems:

    One, your code will only ever fetch one page per call. Given that you've named it get_all_xxx, this is probably not intended behavior. Two, you're using list.extend instead of list.append, which is probably what you want. Three, there's no return on this function.

    I've tweaked and cleaned up your code, this should serve as a minimal working example of what you'll need. It's still missing error handling around the request itself, but this should be a good base. Note that the pagination is now within a loop.

    import requests
    import json
    import pandas as pd
    
    url = "https://xxxxxxxx.xxxxxxxxx.com/projects/api/v3/tasklists"
    
    def get_all_tasklists(url, pageSize=100, page=1):
        dataframes = []
    
        while True:
            response = requests.get(
                f'{url}?page={page}&pageSize={pageSize}',
                headers = {"authorization": xxxxxxxxxxxx}
            )
    
            meta_data = json.loads(response.text)['meta']
            has_more = meta_data['page']['hasMore']
    
            x = pd.json_normalize(json.loads(response.text))
            dataframes.append(x)
      
            if has_more:
                page += 1
            else:
                return pd.concat(timelogs)
        
    get_all_tasklists(url, pageSize=100, page=1)