pythonselenium-webdrivergoogle-apigoogle-drive-apigsp

Handling BrokenPipeError in Selenium Python Script during Failure to Open URLs and Upload to Google Drive


I am working on a Python script that navigates through a list of URLs, takes screenshots of the webpages, and then uploads them to Google Drive using Selenium, Google API, and GSP. The script should attempt to open each URL five times; if it fails to open the URL after five attempts, it is supposed to skip the current iteration with a continue statement and move to the next URL.

However, I am encountering a BrokenPipeError whenever the script fails to open a URL after the specified attempts. Instead of continuing to the next URL, the script stops execution, which is not the expected behavior. Below is the relevant part of the code:

max_attempts = 5

for record in records:
    url = record['Link']
    folder_id = record['Link to folder']
    successful_connection = False  # Flag to track if connection was successful

    for attempt in range(max_attempts):
        try:
            driver.get(url)
            time.sleep(random.uniform(1, 3))
            successful_connection = True  # Set the flag to True if successful
            break  # Exit the loop if successful
        except Exception as e:  # Catch the specific exception if possible
            print(f"Attempt {attempt + 1} of {max_attempts} failed: {str(e)}")
            time.sleep(10)  # Wait for 10 seconds before retrying

    if not successful_connection:
        print(f"Failed to connect to {url} after {max_attempts} attempts.")
        continue  # Skip the rest of the code in this loop iteration and move to the next record
    
    # If connection was successful, proceed with screenshot and upload
    current_date = datetime.now().strftime('%Y-%m-%d')
    page_width = driver.execute_script('return document.body.scrollWidth')
    page_height = driver.execute_script('return document.body.scrollHeight')
    screenshot_path = f"{current_date}-{record['Client']}-{record['Platform']}.png"
    driver.set_window_size(page_width, page_height)
    driver.save_screenshot(screenshot_path)

    # Upload to Google Drive
    file_metadata = {'name': screenshot_path, 'parents': [folder_id]}
    media = MediaFileUpload(screenshot_path, mimetype='image/png')
    file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()
    
    os.remove(screenshot_path)

driver.quit()

And the error:

    self._send_request(method, url, body, headers, encode_chunked)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1331, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1280, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1040, in _send_output
    self.send(msg)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1001, in send
    self.sock.sendall(data)
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/ssl.py", line 1238, in sendall
    v = self.send(byte_view[count:])
  File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/ssl.py", line 1207, in send
    return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe
Error: Process completed with exit code 1.

I suspect the issue might be related to the way exceptions are handled or how resources are managed, but I am not sure how to pinpoint the problem or resolve the BrokenPipeError. Any suggestions or insights into what might be causing this issue and how to fix it would be greatly appreciated.

I tried creating an empty PNG file and upload a dummy in case the connection is unsuccessful, but still getting the same error.


Solution

  • Specific Exception Handling: Catching a broad Exception might catch more than just connection-related issues. It's good practice to catch more specific exceptions that might be raised by driver.get() to handle different error scenarios more appropriately. For example, you might want to catch TimeoutException for timeouts, WebDriverException for general WebDriver issues, or others depending on your use case.

    python
    
    from selenium.common.exceptions import TimeoutException, WebDriverException
    
    for attempt in range(max_attempts):
        try:
            driver.get(url)
            time.sleep(random.uniform(1, 3))
            successful_connection = True
            break
        except TimeoutException as e:
            print(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
            time.sleep(10)
        except WebDriverException as e:
            print(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
            time.sleep(10)
        # Add more specific exceptions as needed
    

    Logging: Consider using the logging module instead of print statements for logging. This allows you to have more control over log levels, formatting, and directing logs to different outputs.

    python
    
    import logging
    
    logging.basicConfig(level=logging.INFO)
    
    for attempt in range(max_attempts):
        try:
            driver.get(url)
            time.sleep(random.uniform(1, 3))
            successful_connection = True
            break
        except TimeoutException as e:
            logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
            time.sleep(10)
        except WebDriverException as e:
            logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
            time.sleep(10)
        # Add more specific exceptions as needed
    

    Handling WebDriver Cleanup: Ensure that you handle the WebDriver cleanup even if an exception occurs. You might want to use a try...finally block to make sure driver.quit() is called.

    python
    
        try:
            # Your existing code
        finally:
            driver.quit()
    

    These suggestions are meant to enhance the robustness and maintainability of your script. Depending on your specific use case and requirements, you might need to adjust the exception handling and logging approach accordingly.

    See what you think of this:

    python
    
    import time
    import random
    from datetime import datetime
    from selenium.common.exceptions import TimeoutException, WebDriverException
    import logging
    
    # Configure logging
    logging.basicConfig(level=logging.INFO)
    
    max_attempts = 5
    
    for record in records:
        url = record['Link']
        folder_id = record['Link to folder']
        successful_connection = False  # Flag to track if connection was successful
    
        for attempt in range(max_attempts):
            try:
                driver.get(url)
                time.sleep(random.uniform(1, 3))
                successful_connection = True  # Set the flag to True if successful
                break  # Exit the loop if successful
            except TimeoutException as e:
                logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
                time.sleep(10)
            except WebDriverException as e:
                logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
                time.sleep(10)
            except Exception as e:  # Catch other specific exceptions if needed
                logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: {str(e)}")
                time.sleep(10)
    
        if not successful_connection:
            logging.error(f"Failed to connect to {url} after {max_attempts} attempts.")
            continue  # Skip the rest of the code in this loop iteration and move to the next record
    
        # If connection was successful, proceed with screenshot and upload
        current_date = datetime.now().strftime('%Y-%m-%d')
        page_width = driver.execute_script('return document.body.scrollWidth')
        page_height = driver.execute_script('return document.body.scrollHeight')
        screenshot_path = f"{current_date}-{record['Client']}-{record['Platform']}.png"
        driver.set_window_size(page_width, page_height)
        driver.save_screenshot(screenshot_path)
    
        # Upload to Google Drive
        file_metadata = {'name': screenshot_path, 'parents': [folder_id]}
        media = MediaFileUpload(screenshot_path, mimetype='image/png')
        file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()
    
        os.remove(screenshot_path)
    
    # Ensure proper cleanup
    try:
        driver.quit()
    except Exception as e:
        logging.error(f"Failed to quit the WebDriver: {str(e)}")
    

    In this modified script:

    Specific exceptions like TimeoutException and WebDriverException are caught separately for better error handling.

    Logging is used instead of print statements for better control and flexibility.

    A try...finally block ensures that driver.quit() is called for proper cleanup, even if an exception occurs during the execution.

    Please make sure to adapt the script further based on your specific requirements and the environment in which it runs.