I have a Python script to download an image from a URL and upload it to AWS S3. This script works perfectly when I run it on my local machine. However, when I deploy and run the same script on an AWS EC2 instance, I encounter a ReadTimeout
error.
The error I'm receiving is as follows:
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.net-a-porter.com', port=443): Read timed out. (read timeout=100)
Below is the relevant part of my code:
import requests
import tempfile
import os
def upload_image_to_s3_from_url(self, image_url, filename, download_timeout=120):
"""
Downloads an image from the given URL to a temporary file and uploads it to AWS S3,
then returns the S3 file URL.
"""
try:
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36",
'Accept': 'image/avif,image/webp,image/apng,image/*,*/*;q=0.8'
}
# Request the image
response = requests.get(image_url, timeout=download_timeout, stream=True, headers=headers)
response.raise_for_status()
# Determine the content type
content_type = response.headers.get('Content-Type', 'image/jpeg') # Default to image/jpeg
# Create a temporary file
with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
# Write the response content to the temporary file
for chunk in response.iter_content(chunk_size=8192):
tmp_file.write(chunk)
# Now that we have the image locally, upload it to S3 with the correct content type
file_url = self.upload_image_to_s3(tmp_file.name, filename, content_type)
# Optionally, delete the temporary file here if you set delete=False
os.unlink(tmp_file.name)
return file_url
except requests.RequestException as e:
raise Exception(f"Failed to download or upload image. Error: {e}")
# Example URL causing issues
image_url = "https://www.net-a-porter.com/variants/images/1647597326276381/in/w1365_a3-4_q60.jpg"
This issue occurs when trying to download an image from www.net-a-porter.com
. The timeout is set to 120 seconds, which I assumed would be more than enough.
What I've tried so far:
User-Agent
in the request headersAny insights or suggestions on how to resolve this issue would be greatly appreciated.
Testing showed that the web server is responding when a specific set of headers are added. Not sure on whether this behavior is intentional or not. Changed the User-Agent and added additional headers as below to see it getting response:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0",
'Accept': 'image/avif,image/webp,image/apng,image/*,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Connection': 'keep-alive'
}
Could you try with this ?