This website has questions in image form that I need to scrape. However I cannot even get a link to their source and it outputs links to some loading gifs. When I saw the source code, there weren't even any "src" to the images. You can see how the website works on the link provided above. How can I download all these images?
from bs4 import BeautifulSoup
import requests
import os
url = "https://www.exam-mate.com/topicalpastpapers/?cat=3&subject=22&years=&seasons=&paper=&zone=&chapter=&order=asc0"
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
images = soup.find_all('img')
for image in images:
link = image['src']
print (link)
The question id's are embedded as part of the page, try extracting the id using the re
(regex) module.
import re
import requests
from bs4 import BeautifulSoup
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
}
URL = "https://www.exam-mate.com/topicalpastpapers/?cat=3&subject=22&years=&seasons=&paper=&zone=&chapter=&order=asc0"
BASE_URL = "https://www.exam-mate.com"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")
for tag in soup.select("td:nth-of-type(1) a"):
# Find the question id within the page
question_link = re.search(r"/questions.*\.png", tag["onclick"]).group()
print(BASE_URL + question_link)
Output:
https://www.exam-mate.com/questions/1240/1362/1240_q_1362_1_1.png
https://www.exam-mate.com/questions/1240/1363/1240_q_1363_2_1.png
https://www.exam-mate.com/questions/1240/1364/1240_q_1364_3_1.png
https://www.exam-mate.com/questions/1240/1365/1240_q_1365_4_1.png
https://www.exam-mate.com/questions/1240/1366/1240_q_1366_5_1.png
...And on