pythonpython-3.xhttp-headersuser-agenturlopen

How can I download a file in Python3 with urlopen() or add custom headers to urlretrieve()?


tl;dr I want do download a file from a server who only allows certain User-Agents. I managed to get a 200 OK from the site by using following code:

opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent', 'Interwebs Exploiter 4')]
opener.open(url)

Since the file can be a .pdf or .zip or another format, I want to download it without parsing or reading it. Urlretrieve() seems like a good idea but it uses the default header, which makes the server return a 403 Forbidden.

How can I either download the file by using that custom built opener or simply add headers to urlretrieve()?

And this example in the Python Docs is complete gibberish to me.


Solution

  • I would use requests for that:

    import requests   
    
    headers = {'User-Agent': 'Interwebs Exploiter 4'}
    
     r = requests.get(url, allow_redirects=True, headers=headers)
        with open(filename, 'wb') as f:
            for chunk in r.iter_content(1024):
                f.write(chunk)
    

    Unless it's absolutely essential for some reason to use urllib