In search results of jobquest site (http://jobquest.detma.org/JobQuest/Training.aspx), I would like to use selenium to click the "next" link so that the next paginated results table of 20 records would load. I can only scrape as far as the first 20 results. Here are my steps that got me that far:
Step1: I load the opening page.
import requests, re
from bs4 import BeautifulSoup
from selenium import webdriver
browser = webdriver.Chrome('../chromedriver')
url ='http://jobquest.detma.org/JobQuest/Training.aspx'
browser.get(url)
Step2: I find the search button and click it to request a search with no search criteria. After this code, the search results page loads with the first 20 records in a table:
submit_button = browser.find_element_by_id('ctl00_ctl00_bodyMainBase_bodyMain_btnSubmit')
submit_button.click()
Step3: Now on the search results page, I create some soup and "find_all" to get the correct rows
html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
rows = soup.find_all("tr",{"class":"gvRow"})
At this point, I can fetch my data (job ids) from the first page of results using rows object like this:
id_list=[]
for row in rows:
temp = str(row.find("a"))[33:40]
id_list.append(temp)
QUESTION - Step4 Help!! To reload the table with the next 20 results, I have to click the "next" link on the results page. I used Chrome to inspect it and got these details:
<a href="javascript:__doPostBack('ctl00$ctl00$bodyMainBase$bodyMain$egvResults$ctl01$ctl08','')">Next</a>
I need code to programmatically click on Next and remake the soup with the next 20 records. I expect that if I could figure this out, I can figure out how to loop the code to get all ~1515 IDs in the database.
UPDATE The line that worked for me, suggested in the answer is:
WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[href*=ctl08]'))).click()
Thank you, this was very useful.
You can use an attribute = value selector to target the href
. In this case I use the substring at the end via contains (*
) operator.
WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[href*=ctl08]'))).click()
I add in a wait for clickable condition as a precautionary measure. You could probably remove that.
Additional imports
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
Without wait condition:
browser.find_element_by_css_selector('[href*=ctl08]'),click()
Another way:
Now, instead, you could initially set the page results count to 100 (the max) and then loop through the dropdown for the pages of results to load each new page (then you don't need to work about how many pages)
import requests, re
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
browser = webdriver.Chrome()
url ='http://jobquest.detma.org/JobQuest/Training.aspx'
browser.get(url)
submit_button = browser.find_element_by_id('ctl00_ctl00_bodyMainBase_bodyMain_btnSubmit')
submit_button.click()
WebDriverWait(browser, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[value="100"]'))).click()
html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
rows = soup.find_all("tr",{"class":"gvRow"})
id_list=[]
for row in rows:
temp = str(row.find("a"))[33:40]
id_list.append(temp)
elems = browser.find_elements_by_css_selector('#ctl00_ctl00_bodyMainBase_bodyMain_egvResults select option')
i = 1
while i < len(elems) / 2:
browser.find_element_by_css_selector('#ctl00_ctl00_bodyMainBase_bodyMain_egvResults select option[value="' + str(i) + '"]').click()
#do stuff with new page
i+=1
You decide what to do with the extracting rows info from each page. This was to give you an easy framework for looping all the pages.