pythonurlliburlparse

How to extract a filename from a URL and append a word to it?


I have the following URL:

url = http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg

I would like to extract the file name in this URL: 09-09-201315-47-571378756077.jpg

Once I get this file name, I'm going to save it with this name to the Desktop.

filename = **extracted file name from the url**     
download_photo = urllib.urlretrieve(url, "/home/ubuntu/Desktop/%s.jpg" % (filename))

After this, I'm going to resize the photo, once that is done, I've going to save the resized version and append the word "_small" to the end of the filename.

downloadedphoto = Image.open("/home/ubuntu/Desktop/%s.jpg" % (filename))               
resize_downloadedphoto = downloadedphoto.resize.((300, 300), Image.ANTIALIAS)
resize_downloadedphoto.save("/home/ubuntu/Desktop/%s.jpg" % (filename + _small))

From this, what I am trying to achieve is to get two files, the original photo with the original name, then the resized photo with the modified name. Like so:

09-09-201315-47-571378756077.jpg

rename to:

09-09-201315-47-571378756077_small.jpg

How can I go about doing this?


Solution

  • You can use urllib.parse.urlparse with os.path.basename:

    import os
    from urllib.parse import urlparse
    
    url = "http://photographs.500px.com/kyle/09-09-201315-47-571378756077.jpg"
    a = urlparse(url)
    print(a.path)                    # Output: /kyle/09-09-201315-47-571378756077.jpg
    print(os.path.basename(a.path))  # Output: 09-09-201315-47-571378756077.jpg
    

    Your URL might contain percent-encoded characters like %20 for space or %E7%89%B9%E8%89%B2 for "特色". If that's the case, you'll need to unquote (or unquote_plus) them. You can also use pathlib.Path().name instead of os.path.basename, which could help to add a suffix in the name (like asked in the original question):

    from pathlib import Path
    from urllib.parse import urlparse, unquote
    
    url = "http://photographs.500px.com/kyle/09-09-2013%20-%2015-47-571378756077.jpg"
    urlparse(url).path
    
    url_parsed = urlparse(url)
    print(unquote(url_parsed.path))  # Output: /kyle/09-09-2013 - 15-47-571378756077.jpg
    file_path = Path("/home/ubuntu/Desktop/") / unquote(Path(url_parsed.path).name)
    print(file_path)        # Output: /home/ubuntu/Desktop/09-09-2013 - 15-47-571378756077.jpg
    
    new_file = file_path.with_stem(file_path.stem + "_small")
    print(new_file)         # Output: /home/ubuntu/Desktop/09-09-2013 - 15-47-571378756077_small.jpg
    

    Also, an alternative is to use unquote(urlparse(url).path.split("/")[-1]).