pythonpython-3.xpython-requestspython-requests-html

Get page with requests without response or status code


I use the following source code:

import requests

url = "https://www.baha.com/nasdaq-100-index/index/tts-751307/name/asc/1/index/performance/471"

web = requests.get(url)
print(web.status_code)

url = "https://www.baha.com/adobe/stocks/details/tts-117450574"
web = requests.get(url)
print(web.status_code)

url = "https://www.baha.com/advanced-micro-devices/stocks/details/tts-117449963"
web = requests.get(url)
print(web.status_code)

url = "https://www.baha.com/airbnb-inc/stocks/details/tts-208432020"
web = requests.get(url)
print(web.status_code)

url = "https://www.baha.com/alphabet-a/stocks/details/tts-117453820"
web = requests.get(url)
print(web.status_code)

url = "https://www.baha.com/alphabet-c/stocks/details/tts-117453810"
web = requests.get(url)
print(web.status_code)

Most of the time only the first three pages can be parsed, after that there is no status code and the program seems to stop responding or I sometimes get a 503 response even though I could open the page in the browser.

How does the problem arise, how can I solve it?


Solution

  • The problem arises from a flood of HTTP requests that you issue from your script. Apparently, http://www.baha.com has some security measures preventing it from being DDoSed by single host issuing too many simultaneous (or close to parallel) HTTP requests.

    You can prevent it by adding an artificial delay between the requests as @robert-haas is suggesting:

    at the beginning of the script:

    import time
    

    then after each requests.get:

    time.sleep(0.1)
    

    for 100 ms waiting time between the requests (you can tailor that until you no longer encounter an issue - I can imagine that you may want to put time.sleep(3) every now and then or the DDoS prevention will again stop you after several requests.