I am working on a Python script that navigates through a list of URLs, takes screenshots of the webpages, and then uploads them to Google Drive using Selenium, Google API, and GSP. The script should attempt to open each URL five times; if it fails to open the URL after five attempts, it is supposed to skip the current iteration with a continue
statement and move to the next URL.
However, I am encountering a BrokenPipeError
whenever the script fails to open a URL after the specified attempts. Instead of continuing to the next URL, the script stops execution, which is not the expected behavior. Below is the relevant part of the code:
max_attempts = 5
for record in records:
url = record['Link']
folder_id = record['Link to folder']
successful_connection = False # Flag to track if connection was successful
for attempt in range(max_attempts):
try:
driver.get(url)
time.sleep(random.uniform(1, 3))
successful_connection = True # Set the flag to True if successful
break # Exit the loop if successful
except Exception as e: # Catch the specific exception if possible
print(f"Attempt {attempt + 1} of {max_attempts} failed: {str(e)}")
time.sleep(10) # Wait for 10 seconds before retrying
if not successful_connection:
print(f"Failed to connect to {url} after {max_attempts} attempts.")
continue # Skip the rest of the code in this loop iteration and move to the next record
# If connection was successful, proceed with screenshot and upload
current_date = datetime.now().strftime('%Y-%m-%d')
page_width = driver.execute_script('return document.body.scrollWidth')
page_height = driver.execute_script('return document.body.scrollHeight')
screenshot_path = f"{current_date}-{record['Client']}-{record['Platform']}.png"
driver.set_window_size(page_width, page_height)
driver.save_screenshot(screenshot_path)
# Upload to Google Drive
file_metadata = {'name': screenshot_path, 'parents': [folder_id]}
media = MediaFileUpload(screenshot_path, mimetype='image/png')
file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()
os.remove(screenshot_path)
driver.quit()
And the error:
self._send_request(method, url, body, headers, encode_chunked)
File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1331, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1280, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1040, in _send_output
self.send(msg)
File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/http/client.py", line 1001, in send
self.sock.sendall(data)
File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/ssl.py", line 1238, in sendall
v = self.send(byte_view[count:])
File "/opt/hostedtoolcache/Python/3.9.18/x64/lib/python3.9/ssl.py", line 1207, in send
return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe
Error: Process completed with exit code 1.
I suspect the issue might be related to the way exceptions are handled or how resources are managed, but I am not sure how to pinpoint the problem or resolve the BrokenPipeError. Any suggestions or insights into what might be causing this issue and how to fix it would be greatly appreciated.
I tried creating an empty PNG file and upload a dummy in case the connection is unsuccessful, but still getting the same error.
Specific Exception Handling: Catching a broad Exception might catch more than just connection-related issues. It's good practice to catch more specific exceptions that might be raised by driver.get() to handle different error scenarios more appropriately. For example, you might want to catch TimeoutException for timeouts, WebDriverException for general WebDriver issues, or others depending on your use case.
python
from selenium.common.exceptions import TimeoutException, WebDriverException
for attempt in range(max_attempts):
try:
driver.get(url)
time.sleep(random.uniform(1, 3))
successful_connection = True
break
except TimeoutException as e:
print(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
time.sleep(10)
except WebDriverException as e:
print(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
time.sleep(10)
# Add more specific exceptions as needed
Logging: Consider using the logging module instead of print statements for logging. This allows you to have more control over log levels, formatting, and directing logs to different outputs.
python
import logging
logging.basicConfig(level=logging.INFO)
for attempt in range(max_attempts):
try:
driver.get(url)
time.sleep(random.uniform(1, 3))
successful_connection = True
break
except TimeoutException as e:
logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
time.sleep(10)
except WebDriverException as e:
logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
time.sleep(10)
# Add more specific exceptions as needed
Handling WebDriver Cleanup: Ensure that you handle the WebDriver cleanup even if an exception occurs. You might want to use a try...finally block to make sure driver.quit() is called.
python
try:
# Your existing code
finally:
driver.quit()
These suggestions are meant to enhance the robustness and maintainability of your script. Depending on your specific use case and requirements, you might need to adjust the exception handling and logging approach accordingly.
See what you think of this:
python
import time
import random
from datetime import datetime
from selenium.common.exceptions import TimeoutException, WebDriverException
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
max_attempts = 5
for record in records:
url = record['Link']
folder_id = record['Link to folder']
successful_connection = False # Flag to track if connection was successful
for attempt in range(max_attempts):
try:
driver.get(url)
time.sleep(random.uniform(1, 3))
successful_connection = True # Set the flag to True if successful
break # Exit the loop if successful
except TimeoutException as e:
logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: Timeout - {str(e)}")
time.sleep(10)
except WebDriverException as e:
logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: WebDriver issue - {str(e)}")
time.sleep(10)
except Exception as e: # Catch other specific exceptions if needed
logging.error(f"Attempt {attempt + 1} of {max_attempts} failed: {str(e)}")
time.sleep(10)
if not successful_connection:
logging.error(f"Failed to connect to {url} after {max_attempts} attempts.")
continue # Skip the rest of the code in this loop iteration and move to the next record
# If connection was successful, proceed with screenshot and upload
current_date = datetime.now().strftime('%Y-%m-%d')
page_width = driver.execute_script('return document.body.scrollWidth')
page_height = driver.execute_script('return document.body.scrollHeight')
screenshot_path = f"{current_date}-{record['Client']}-{record['Platform']}.png"
driver.set_window_size(page_width, page_height)
driver.save_screenshot(screenshot_path)
# Upload to Google Drive
file_metadata = {'name': screenshot_path, 'parents': [folder_id]}
media = MediaFileUpload(screenshot_path, mimetype='image/png')
file = drive_service.files().create(body=file_metadata, media_body=media, fields='id').execute()
os.remove(screenshot_path)
# Ensure proper cleanup
try:
driver.quit()
except Exception as e:
logging.error(f"Failed to quit the WebDriver: {str(e)}")
In this modified script:
Specific exceptions like TimeoutException and WebDriverException are caught separately for better error handling.
Logging is used instead of print statements for better control and flexibility.
A try...finally block ensures that driver.quit() is called for proper cleanup, even if an exception occurs during the execution.
Please make sure to adapt the script further based on your specific requirements and the environment in which it runs.