I am trying to download >100 pdf from a website using python. However, those pdfs are hidden under the selection option. For example:
Then, if I choose Option 1, I something lie this:
Once I press on, e.g., "Clickable link to File 1", picture pops up with an option to "View PDF" in top right corner of the pop up. Now how do I download PDFs in a loop for each of the files under Option 1? I am new to webscraping and your help will be greatly appreciated.
Thanks!
It seems that you can construct PDF Url from the link identifier automatically. For example:
import requests
from bs4 import BeautifulSoup
url = "https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/prof/search-recherche/lst/results-resultats.cfm?Lang=E&TABID=1&G=1&Geo1=&Code1=&Geo2=&Code2=&GEOCODE=35&type=0"
map_url = "https://www12.statcan.gc.ca/census-recensement/geo/maps-cartes/pdf/{id1}/{id2}.pdf"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
for a in soup.select("a[data-dguid]"):
id_ = a["data-dguid"]
m = map_url.format(id1=id_[4:9], id2=id_)
print("{:<60} {}".format(a["data-geoname"], m))
Prints:
...
Map: Arthur [Population center], Ontario https://www12.statcan.gc.ca/census-recensement/geo/maps-cartes/pdf/S0510/2016S05100022.pdf
Map: Atikokan [Population center], Ontario https://www12.statcan.gc.ca/census-recensement/geo/maps-cartes/pdf/S0510/2016S05100028.pdf
Map: Attawapiskat 91A [Population center], Ontario https://www12.statcan.gc.ca/census-recensement/geo/maps-cartes/pdf/S0510/2016S05101497.pdf
Map: Aylmer [Population center], Ontario https://www12.statcan.gc.ca/census-recensement/geo/maps-cartes/pdf/S0510/2016S05100030.pdf
Map: Ayr [Population center], Ontario https://www12.statcan.gc.ca/census-recensement/geo/maps-cartes/pdf/S0510/2016S05100031.pdf
Map: Azilda [Population center], Ontario https://www12.statcan.gc.ca/census-recensement/geo/maps-cartes/pdf/S0510/2016S05101498.pdf
Map: Ballantrae [Population center], Ontario https://www12.statcan.gc.ca/census-recensement/geo/maps-cartes/pdf/S0510/2016S05101370.pdf
Map: Barrie [Population center], Ontario https://www12.statcan.gc.ca/census-recensement/geo/maps-cartes/pdf/S0510/2016S05100043.pdf
Map: Barry's Bay [Population center], Ontario https://www12.statcan.gc.ca/census-recensement/geo/maps-cartes/pdf/S0510/2016S05100044.pdf
Map: Bath [Population center], Ontario https://www12.statcan.gc.ca/census-recensement/geo/maps-cartes/pdf/S0510/2016S05101403.pdf
...