pythonarchiveimagedownload

Downloading and renaming images from multiple URL with the same file name


I am trying to download images from an archive. I have the image URLs and am able to successfully download each file using the code below. However some of the images use the same name (e.g compressed.jpg) so when running the command only one compressed.jpg file is created.

I want to be able to rename these files on download so I end up with compressed1.jpg, compressed2.jpg etc. I am very new to Python so am getting myself into a complete mess trying to add incremental numbers to the end of the file names.

Thank you

import requests    
image_url =[
  'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/975/thumbnail/compressed.jpg',
  'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/105/093/thumbnail/compressed.jpg',
  'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/984/thumbnail/compressed.jpg',
  'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/107/697/thumbnail/compressed.jpg'
]
for img in image_url:     
     file_name = img.split('/')[-1]     
     print("Downloading file:%s"%file_name)    
     r = requests.get(img, stream=True)      
     # this should be file_name variable instead of "file_name" string    
     with open(file_name, 'wb') as f:    
         for chunk in r:    
             f.write(chunk)    

I have tried using os and glob to rename but no luck - how can I get the files to rename before being downloaded?


Solution

  • You just add an index to the filename. To get the index from your for loop you use enumerate on the image_url list. You then split the filename to get a list of name and extension which you can use to add the index number.

    import requests
    import os.path
    
    image_url = [
        'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/975/thumbnail/compressed.jpg',
        'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/105/093/thumbnail/compressed.jpg',
        'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/984/thumbnail/compressed.jpg',
        'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/107/697/thumbnail/compressed.jpg'
    ]
    for index, img in enumerate(image_url):
        file_name_string = img.split('/')[-1]
        file_name_list = os.path.splitext(file_name_string)
        target_file = f"{file_name_list[0]}{index + 1}{file_name_list[1]}"
        print("Downloading file:%s" % target_file)
        r = requests.get(img, stream=True)
        with open(target_file, 'wb') as f:
            for chunk in r:
                f.write(chunk)