I am trying to download images from an archive. I have the image URLs and am able to successfully download each file using the code below. However some of the images use the same name (e.g compressed.jpg) so when running the command only one compressed.jpg file is created.
I want to be able to rename these files on download so I end up with compressed1.jpg, compressed2.jpg etc. I am very new to Python so am getting myself into a complete mess trying to add incremental numbers to the end of the file names.
Thank you
import requests
image_url =[
'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/975/thumbnail/compressed.jpg',
'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/105/093/thumbnail/compressed.jpg',
'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/984/thumbnail/compressed.jpg',
'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/107/697/thumbnail/compressed.jpg'
]
for img in image_url:
file_name = img.split('/')[-1]
print("Downloading file:%s"%file_name)
r = requests.get(img, stream=True)
# this should be file_name variable instead of "file_name" string
with open(file_name, 'wb') as f:
for chunk in r:
f.write(chunk)
I have tried using os and glob to rename but no luck - how can I get the files to rename before being downloaded?
You just add an index to the filename. To get the index from your for loop you use enumerate on the image_url list. You then split the filename to get a list of name and extension which you can use to add the index number.
import requests
import os.path
image_url = [
'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/975/thumbnail/compressed.jpg',
'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/105/093/thumbnail/compressed.jpg',
'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/103/984/thumbnail/compressed.jpg',
'https://s3-eu-west-1.amazonaws.com/sheffdocfest.com/attachments/data/000/107/697/thumbnail/compressed.jpg'
]
for index, img in enumerate(image_url):
file_name_string = img.split('/')[-1]
file_name_list = os.path.splitext(file_name_string)
target_file = f"{file_name_list[0]}{index + 1}{file_name_list[1]}"
print("Downloading file:%s" % target_file)
r = requests.get(img, stream=True)
with open(target_file, 'wb') as f:
for chunk in r:
f.write(chunk)