pythonseleniumweb-scrapingdynamic-tables

Scrape data from dynamic table using Python & Selenium


I'm trying to scrape data from this URL : https://qmjhldraft.rinknet.com/results.htm?year=2018, but I can't seem to be even able to scrape one single name from the dynamic table.

Here's the code that I currently have :

from selenium import webdriver
PATH = 'C:\Program Files (x86)\chromedriver.exe'
driver = webdriver.Chrome(PATH)
driver.get('https://qmjhldraft.rinknet.com/results.htm?year=2018')
element = driver.find_element_by_xpath('//*[@id="ht-results-table"]/tbody[1]/tr[2]/td[4]').text
print(element)

The code gives me this error :

NoSuchElementException: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="ht-results-table"]/tbody[1]/tr[2]/td[4]"}

There's obviously something wrong with my XPath, but I can't figure out what.

Thanks a lot for the help!


Solution

  • the first problem is as if the website is loading dynamically you need to give some time to load the page fully. to solve it you can use this

    time.sleep(2) 
    // change the number according to your need.
    element = driver.find_element_by_xpath('//*[@id="ht-results-table"]/tbody[1]/tr[2]/td[4]').text
    

    the best way is using Explicit Waits. this will wait for the element to load then execute the next step.

    2nd problem is you shouldn't just copy the XPath from the chrome dev tools

    to get all the names you can do this

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    PATH = 'C:\Program Files (x86)\chromedriver.exe'
    driver = webdriver.Chrome(PATH)
    driver.get('https://qmjhldraft.rinknet.com/results.htm?year=2018')
    
    try:
        elements = WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.XPATH, "//tr[@rnid]/td[3]"))
        )
    finally:
        names = driver.find_elements_by_xpath('//tr[@rnid]/td[3]')
    
    for name in names:
        nm = name.text
        print(nm)
    
    driver.quit()