pythonfiledownload

How to download a file from a website using python


I am trying to use python to download the following image file. The file is inside a website which requires a login to access, but there are 2 links to the file.

below is secure link (which has the numbers taken out) https://assetplanner.com/files/ServiceRequest/616/img_20241021093256.jpg?token=#####

below is publicly accessible link https://assetplanner.com/files/?f=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJsaW5rIjoiZmlsZXNcL1NlcnZpY2VSZXF1ZXN0XC82MTZcL2ltZ18yMDI0MTAyMTA5MzIyOC5qcGciLCJ0b2tlbiI6IlQ1MDIwOVoxMDAyIiwidWlkIjoiMjgwNDQ1IiwiZHQiOjE3Mzg2NDE3MTB9.d29xUb8q914dJmZnDCzTp5DY1x-PKIfAwhrpiq9kDUQ

I've tried urlretrieve

from urllib.request import urlretrieve
url='https://tor.assetplanner.com/service_request?ID=230'   
filename = "changed.pdf"

Which does nothing.

I've tired urllib request

import urllib.request
url = 'https://imgs.xkcd.com/comics/python.png'
urllib.request.urlretrieve(url, fr"C:\Users\location\xkcd_comic.png")

which works when I take from the website in the example I found, but when i try it on the website i want

import urllib.request
url='https://assetplanner.com/files/ServiceRequest/153/changed.jpg?token=#####'
urllib.request.urlretrieve(url, fr"C:\Users\location\xattempt.png")

This one returns the 14 kb file, just named xattempt.png

I've tried requests

import requests
url = 'https://tor.assetplanner.com/service_request?ID=616'
r = requests.get(url)
print(r.json())

And requests another way

import requests
url = 'https://assetplanner.com/files/ServiceRequest/230/changed.pdf?token=#####'

response = requests.get(url)
file_Path = 'research_Paper_1.pdf'

if response.status_code == 200:
    with open(file_Path, 'wb') as file:
        file.write(response.content)
    print('File downloaded successfully')
else:
    print('Failed to download file')

And requests another way

import requests

url = "https://assetplanner.com/files/ServiceRequest/230/changed.pdf?token=#####"
query_parameters = {"downloadformat": "pdf"}
response = requests.get(url, params=query_parameters)
response.url
response.ok
response.status_code
with open("#####.pdf", mode="wb") as file:
    file.write(response.content)

But they all produce a 14kb document which doesn't work.

I've tried wget

import wget
url = 'https://tor.assetplanner.com/service_request?ID=153'
wget.download(url, 'img_#####.jpg')

I end up with a 14 kg img_#####.jpg

I know this is difficult, but is it obvious what I am doing wrong? Does anyone know a way to do it? I am using python to make a program where you type in the report numbers, and it will go to those reports, and then download the files, pdf or jpg.

I thought it may have had something to do with me needing to log in, the the publicly accessible link doesn't need that.

EDIT Simon did answer the question. I have to work on it a little bit more, but his answer works.

To answer other questions, Yes, the files are behind a login. The secure website has 2 links to the photos, one which is permanent, but you need to log in to get to it, and another which is public, but expires after a period of time.

I never tried to use the public link because I have no idea how to programmatically figure out what the link is. The secure link has a pattern and I can figure it out from the website code.

I was also limited in what I could post because there were tokens, and as far as I can tell they are similar to my login, which I obviously can't make public.

Sorry for the difficult question, and thank you for figuring it out.


Solution

  • It looks like the issue is related to the need for authentication to download the file. Since the file is behind a login, you need to first authenticate your session and then use that session to make the request for the file. You need to send your login credentials to the website and maintain a session to store cookies (this way you stay logged in while making the file request).Once you're logged in, use that session to send the request for the file.

    using the requests library:

    import requests
    
    # Start a session to maintain cookies
    session = requests.Session()
    
    # Login to the website
    login_url = 'https://assetplanner.com/login'
    login_data = {
        'username': 'your_username',
        'password': 'your_password'
    }
    
    login_response = session.post(login_url, data=login_data)
    
    if login_response.ok:
        print("Login successful!")
    else:
        print("Login failed!")
    
    # Now, download the file using the session
    file_url = 'https://assetplanner.com/files/ServiceRequest/616/img_20241021093256.jpg?token=#####'
    
    response = session.get(file_url)
    
    if response.status_code == 200:
        with open('downloaded_image.jpg', 'wb') as file:
            file.write(response.content)
        print("File downloaded successfully!")
    else:
        print(f"Failed to download file. Status code: {response.status_code}")
    

    This will maintain your logged in session and allow you to download the file.

    I hope this helps!