The Stack Exchange API returns only 30 items per request. I used a for
loop to call the stack Exchange API like given below to get 4500 records.
import requests
complete_data=[]
for i in range (150):
response = requests.get("https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&site=stackoverflow")
newData=json.loads(response.text)
for item in newData['items']:
complete_data.append(item)
But while analyzing the questions I got from the API, there was same data sets which was received 150 times. So I have received same data set for each data request in the code. I need to have near 5000 records to analyze data. Can anyone show me what changes should I do in my code?
You're actually fetching 30 items per request and the same page (the first one). Define pagesize (max 100, min 1) and page (i + 1
) in order to solve the problem:
import requests
import time
complete_data=[]
for i in range (45):
response = requests.get("https://api.stackexchange.com/2.2/questions?order=desc&sort=activity&site=stackoverflow&pagesize=100&page=" + str(i + 1))
newData=json.loads(response.text)
for item in newData['items']:
complete_data.append(item)
print("Processed page " + str(i + 1) + ", returned " + str(response))
time.sleep(2) # timeout not to be rate-limited
Notes: