I'm trying to perform what I thought was a fairly simple task. For each element in a list of web elements, I want to find a href, go to the page, get data, and move on to the next element. Here is my function.
def get_ads_dates(driver, base_url):
objects_dates = []
wait = WebDriverWait(driver, 5)
driver.get(base_url)
# ads - is a list of web elements
ads = driver.find_elements(By.CLASS_NAME, 'object-cards-block.d-flex.cursor-pointer')
for ad in ads:
try:
ad_link = wait(ad, 5).until(EC.presence_of_element_located((By.CSS_SELECTOR, 'div.main-container-margins.width-100 > a')))
except NoSuchElementException:
print('no such element')
# following the link
driver.get(ad_link.get_attribute('href'))
ad_date = driver.find_element(By.XPATH, '//*[@id="Data"]/div/div/div[4]/div[2]/span').text
objects_dates.append(ad_date)
The possibility of using the wait().until(EC.presence_of_element_located()) construction I've found here and here. But I am getting TypeError: 'WebDriverWait' object is not callable.
wait(ad,5)
- I'm getting StaleElementReferenceException.ad
(that is, not indicating directly elements in the list) - I'm getting data only for the first elemenmt.
How can I loop through all web-elements in the list using wait strategy?Python 3.11. PyCharm
Once you've called get() the driver has a "view" of that page. Once you call get() again that "view" is lost.
There are many ways to approach this. The simplest (but not efficient) way would be to get all the HREFs first then visit the individual pages as follows:
from selenium.webdriver import Chrome, ChromeOptions
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
BASE_URL = "https://upn.ru/kupit/kvartiry"
def get_ads_dates(driver, base_url):
driver.get(base_url)
wait = WebDriverWait(driver, 10)
ec = EC.presence_of_all_elements_located
loc = (By.CLASS_NAME, "object-cards-block.d-flex.cursor-pointer")
hrefs = []
for a in wait.until(ec(loc)):
link = a.find_element(
By.CSS_SELECTOR, "div.main-container-margins.width-100 > a"
)
if (href := link.get_attribute("href")) is not None:
hrefs.append(href)
dates = []
for href in hrefs:
driver.get(href)
ec = EC.presence_of_element_located
loc = (By.XPATH, "//*[@id='Data']/div/div/div[4]/div[2]/span")
span = wait.until(ec(loc))
text = span.text
dates.append(text)
print(text)
return dates
if __name__ == "__main__":
options = ChromeOptions()
options.add_argument("--headless=new")
with Chrome(options) as driver_:
ads = get_ads_dates(driver_, BASE_URL)
Output (partial):
Размещено: 16.06.2025 11
Размещено: 06.06.2025 6
Размещено: 02.06.2025 57
Размещено: 19.04.2025 42
Размещено: 03.04.2025 29
Размещено: 25.03.2025 63
...
A multithreaded approach would be significantly more efficient