I have a list of ~250000 urls, that I need to get data from an API.
I have created a class using the grequests library to make asynchronous calls. However the API limit is 100 calls per second, which grequest surpasses.
Code using grequests:
import grequests
lst = ['url.com','url2.com']
class Test:
def __init__(self):
self.urls = lst
def exception(self, request, exception):
print ("Problem: {}: {}".format(request.url, exception))
def async(self):
return grequests.map((grequests.get(u) for u in self.urls), exception_handler=self.exception, size=100000)
def collate_responses(self, results):
return [x.text for x in results]
test = Test()
#here we collect the results returned by the async function
results = test.async()
Is there anyway I can use the requests library to make 100 calls per second?
I tried requests, but it times out after roughly 100000 calls.
In this case I am passing an ID into the URL.
import requests
L = [1,2,3]
for i in L:
#print (row)
url = 'url.com/Id={}'.format(i)
xml_data1 = requests.get(url).text
lst.append(xml_data1)
time.sleep(1)
print(xml_data1)
Use multithreading.
from multiprocessing.dummy import Pool as ThreadPool
def some_fun(url):
for i in L:
#print (row)
url = 'url.com/Id={}'.format(i)
xml_data1 = requests.get(url).text
lst.append(xml_data1)
time.sleep(1)
print(xml_data1)
if __name__ == '__main__':
lst = ['url.com','url2.com']
c_pool = ThreadPool(30) #add as many as threads you can
c_pool.map(some_fun, lst)
c_pool.close()
c_pool.join()
Cheers!