pythonweb-scrapingbeautifulsoupweb-crawlernetwork-connection

Handling Network Errors Python, Web Crawler


I am making a web crawler on a large scale website. However, it always hit closed connections, SSL errors or other intermittent failures due to unstable connection.Thus I am finding a way to deal with this problem. This is my code below,can anyone tell me how to implement the wait or try again to start the project again when there is network connection

try:
     requests.get("http://example.com")
except requests.exceptions.RequestException:
     pass  # handle the exception. maybe wait and try again later

Solution

  • Without trying to listen to the network interface itself, you can add a simple 'retry' mechanism when it fails:

    import time
    
    while True:
        try:
            requests.get("http://example.com")
            break  # you can also check the returned status before breaking the loop
        except requests.exceptions.RequestException:
            time.sleep(300)  # wait 5 mins before retry