python-3.xamazon-web-servicesamazon-ec2requesttimeoutexception

ReadTimeout error when downloading images on AWS EC2 but not locally


I have a Python script to download an image from a URL and upload it to AWS S3. This script works perfectly when I run it on my local machine. However, when I deploy and run the same script on an AWS EC2 instance, I encounter a ReadTimeout error.

The error I'm receiving is as follows:

requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www.net-a-porter.com', port=443): Read timed out. (read timeout=100)

Below is the relevant part of my code:

import requests
import tempfile
import os

def upload_image_to_s3_from_url(self, image_url, filename, download_timeout=120):
    """
    Downloads an image from the given URL to a temporary file and uploads it to AWS S3,
    then returns the S3 file URL.
    """
    try:
        headers = {
            "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36",
            'Accept': 'image/avif,image/webp,image/apng,image/*,*/*;q=0.8'
        }
        # Request the image
        response = requests.get(image_url, timeout=download_timeout, stream=True, headers=headers)
        response.raise_for_status()
        
        # Determine the content type
        content_type = response.headers.get('Content-Type', 'image/jpeg')  # Default to image/jpeg

        # Create a temporary file
        with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
            # Write the response content to the temporary file
            for chunk in response.iter_content(chunk_size=8192):
                tmp_file.write(chunk)
            
            # Now that we have the image locally, upload it to S3 with the correct content type
            file_url = self.upload_image_to_s3(tmp_file.name, filename, content_type)

        # Optionally, delete the temporary file here if you set delete=False
        os.unlink(tmp_file.name)

        return file_url
    except requests.RequestException as e:
        raise Exception(f"Failed to download or upload image. Error: {e}")

# Example URL causing issues
image_url = "https://www.net-a-porter.com/variants/images/1647597326276381/in/w1365_a3-4_q60.jpg"

This issue occurs when trying to download an image from www.net-a-porter.com. The timeout is set to 120 seconds, which I assumed would be more than enough.

What I've tried so far:

Any insights or suggestions on how to resolve this issue would be greatly appreciated.


Solution

  • Testing showed that the web server is responding when a specific set of headers are added. Not sure on whether this behavior is intentional or not. Changed the User-Agent and added additional headers as below to see it getting response:

    headers = {
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/115.0",
                'Accept': 'image/avif,image/webp,image/apng,image/*,*/*;q=0.8',
                'Accept-Language': 'en-US,en;q=0.5',
                'Accept-Encoding': 'gzip, deflate, br',
                'Connection': 'keep-alive'
            }
    

    Could you try with this ?