I am trying to download a pdf using python script. I had tried using urlib, pdfkit and also curl. While I am trying to download the pdf, I am getting html/js content of the page instead of the pdf file. Kindly help me to solve this issue.
using pdfkit:
import pdfkit
pdfkit.from_url('http://www.kubota.com/product/BSeries/B2301/pdf/B01_Specs.pdf', 'out.pdf', options = {'javascript-delay':'10000'})
using urllib:
import urllib2
response = urllib2.urlopen('http://www.kubota.com/product/BSeries/B2301/pdf/B01_Specs.pdf')
file = open("out.pdf", 'wb')
file.write(response.read())
file.close()
You could use the urllib3
library
import urllib3
def download_file(download_url):
http = urllib3.PoolManager()
response = http.request('GET', download_url)
f = open('output.pdf', 'wb')
f.write(response.data)
f.close()
if __name__ == '__main__':
download_file('http://www.kubota.com/product/BSeries/B2301/pdf/B01_Specs.pdf')