tl;dr I want do download a file from a server who only allows certain User-Agents
. I managed to get a 200 OK
from the site by using following code:
opener = urllib.request.build_opener()
opener.addheaders = [('User-Agent', 'Interwebs Exploiter 4')]
opener.open(url)
Since the file can be a .pdf or .zip or another format, I want to download it without parsing or reading it. Urlretrieve()
seems like a good idea but it uses the default header, which makes the server return a 403 Forbidden
.
How can I either download the file by using that custom built opener or simply add headers to urlretrieve()
?
And this example in the Python Docs is complete gibberish to me.
I would use requests
for that:
import requests
headers = {'User-Agent': 'Interwebs Exploiter 4'}
r = requests.get(url, allow_redirects=True, headers=headers)
with open(filename, 'wb') as f:
for chunk in r.iter_content(1024):
f.write(chunk)
Unless it's absolutely essential for some reason to use urllib