pythonjsonstackexchange-api

How get larger amount of data from Stack Exchange API?


The Stack Exchange API returns only 30 items per request. I used a for loop to call the stack Exchange API like given below to get 4500 records.

import requests
complete_data=[]
for i in range (150):
    response = requests.get("https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&site=stackoverflow")
    newData=json.loads(response.text)
    for item in newData['items']:
        complete_data.append(item)

But while analyzing the questions I got from the API, there was same data sets which was received 150 times. So I have received same data set for each data request in the code. I need to have near 5000 records to analyze data. Can anyone show me what changes should I do in my code?


Solution

  • You're actually fetching 30 items per request and the same page (the first one). Define pagesize (max 100, min 1) and page (i + 1) in order to solve the problem:

    import requests
    import time
    
    complete_data=[]
    for i in range (45):
        response = requests.get("https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&site=stackoverflow&pagesize=100&page=" + str(i + 1))
        newData=json.loads(response.text)
        for item in newData['items']:
            complete_data.append(item)
        print("Processed page " + str(i + 1) + ", returned " + str(response))
        time.sleep(2) # timeout not to be rate-limited
    

    Notes: