pythonselenium-webdriver

Loop through web elements using Selenium wait strategy


I'm trying to perform what I thought was a fairly simple task. For each element in a list of web elements, I want to find a href, go to the page, get data, and move on to the next element. Here is my function.

def get_ads_dates(driver, base_url):
    objects_dates = []
    wait = WebDriverWait(driver, 5)
    driver.get(base_url)
    # ads - is a list of web elements
    ads = driver.find_elements(By.CLASS_NAME, 'object-cards-block.d-flex.cursor-pointer')
    for ad in ads:
        try:
            ad_link = wait(ad, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'div.main-container-margins.width-100 > a')))        
        except NoSuchElementException:
            print('no such element')
        # following the link
        driver.get(ad_link.get_attribute('href'))
        ad_date = driver.find_element(By.XPATH, '//*[@id="Data"]/div/div/div[4]/div[2]/span').text
        objects_dates.append(ad_date)

The possibility of using the wait().until(EC.presence_of_element_located()) construction I've found here and here. But I am getting TypeError: 'WebDriverWait' object is not callable.

Python 3.11. PyCharm


Solution

  • Once you've called get() the driver has a "view" of that page. Once you call get() again that "view" is lost.

    There are many ways to approach this. The simplest (but not efficient) way would be to get all the HREFs first then visit the individual pages as follows:

    from selenium.webdriver import Chrome, ChromeOptions
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.wait import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    
    BASE_URL = "https://upn.ru/kupit/kvartiry"
    
    
    def get_ads_dates(driver, base_url):
        driver.get(base_url)
        wait = WebDriverWait(driver, 10)
        ec = EC.presence_of_all_elements_located
        loc = (By.CLASS_NAME, "object-cards-block.d-flex.cursor-pointer")
        hrefs = []
        for a in wait.until(ec(loc)):
            link = a.find_element(
                By.CSS_SELECTOR, "div.main-container-margins.width-100 > a"
            )
            if (href := link.get_attribute("href")) is not None:
                hrefs.append(href)
        dates = []
        for href in hrefs:
            driver.get(href)
            ec = EC.presence_of_element_located
            loc = (By.XPATH, "//*[@id='Data']/div/div/div[4]/div[2]/span")
            span = wait.until(ec(loc))
            text = span.text
            dates.append(text)
            print(text)
        return dates
    
    
    if __name__ == "__main__":
        options = ChromeOptions()
        options.add_argument("--headless=new")
        with Chrome(options) as driver_:
            ads = get_ads_dates(driver_, BASE_URL)
    

    Output (partial):

    Размещено: 16.06.2025 11
    Размещено: 06.06.2025 6
    Размещено: 02.06.2025 57
    Размещено: 19.04.2025 42
    Размещено: 03.04.2025 29
    Размещено: 25.03.2025 63
    ...
    

    A multithreaded approach would be significantly more efficient