I am currently trying to automate the collection of data from a website. The site itself doesn't pause while it's loading data, so I have my program track the displayed number of returned search results after I type in a search query to try and make sure I'm not collecting data before the site has finished loading from its database. Checking the text section where this number is located sometimes gives me a stale reference error.
while TimingLoops < 5:
TimingLoops = TimingLoops+1
time.sleep(0.05)
newelement = driver.find_element(By.XPATH, "/html/body/div[3]/div[3]/div[2]/div[1]/div[2]/span[1]")
newelementText = newelement.text
Error
Exception has occurred: StaleElementReferenceException
Message: stale element reference: stale element not found in the current frame
To prevent the stale reference error, I inserted a line of code to check for the presence of the text that I want. I'm assuming that maybe what's happening is the value in the span disappears during the process of changing? I expected that this line of code would prevent my code from moving forward until the element was present and available to be used. Am I using the wrong expected condition to do this? My knowledge of HTML and python are both pretty limited so maybe this is just the wrong sort of check to use for this element type? I attempted to use a staleness_of check but that cannot be used on a tuple and searching through the list of selenium expected conditions didn't reveal anything that jumped out at me as more suitable than the condition I am using now.
while TimingLoops < 5:
TimingLoops = TimingLoops+1
time.sleep(0.05)
newelement = WebDriverWait(driver,120).until(
EC.presence_of_element_located((By.XPATH,
"/html/body/div[3]/div[3]/div[2]/div[1]/div[2]/span[1]"))
)
newelementText = newelement.text
And I still receive the error
Exception has occurred: StaleElementReferenceException
Message: stale element reference: stale element not found in the current frame
Will I just have to revert to using an explicit wait of some kind?
This is the website: https://bonap.net/TDC/# And this is the specific element that I am trying to capture
Edit: Listing the entire code block needed to recreate the error, including the recommended wait condition by JeffC. The error occurs after typing in a search into the search bar, not before, with newcount[0].text
throwing a stale element reference exception. Sometimes it can run up to 20 or 30 times before the error is thrown.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
driver = webdriver.Chrome()
driver.get("https://bonap.net/TDC/")
GenusName = "Rudbeckia"
x=1
GenusCount = 0
wait = WebDriverWait(driver, 10)
while x<100:
print("The current loop is " + str(x))
x=x+1
counts = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#genus-panel div.taxon-panel-description > span")))
## This is the number of genera before our search
print("The original number of genera is " + counts[0].text)
## Find the Genus Searchbar, delete its current text, and enter our search again
GenusSearchBar = WebDriverWait(driver,120).until(
EC.visibility_of_element_located((By.XPATH, "/html/body/div[3]/div[3]/div[2]/div[1]/div[3]/input"))
)
GenusSearchBar.clear()
GenusSearchBar.send_keys(GenusName)
newcount = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#genus-panel div.taxon-panel-description > span")))
## This is the number of genera after we search
print("The new count is " + newcount[0].text)
I'm not experiencing the issues that you are. This is code that I wrote and it's working. I ran it five times with no issues or errors.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
url = 'https://bonap.net/TDC/#'
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)
wait = WebDriverWait(driver, 10)
counts = wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "#genus-panel div.taxon-panel-description > span")))
print(f'genera: {counts[0].text}')
print(f'species: {counts[1].text}')
and it prints
genera: 3625
species: 24184
NOTE: I just realized... you are using EC.presence_of_element_located()
. Presence means the element is in the DOM but does not guarantee that it's visible or ready to be interacted with. That could be causing your issue.
If you need to click an element, use EC.element_to_be_clickable()
. If you need to interact with an element, e.g. .text
, .send_keys
, etc., then use EC.visibility_of_element_located()
. Generally I would avoid waiting for presence unless you are doing something specific where you are NOT going to interact with the element and you know what you're doing.
A stale element is an element reference stored in a variable that no longer exists because the page has reloaded or changed. For example
e = driver.find_element(...)
driver.refresh() # this wipes out the reference stored in 'e'
e.click() # this throws a stale element exception
The way to avoid stale elements is to make sure the page is stable/loaded before scraping the page. In this case, the code was running faster than the page and the search takes a second to load so your code was grabbing a reference, then the page reloaded, then you accessed that reference throwing the exception. The fix is to grab a reference, do the search, wait for the reference to go stale using EC.staleness_of()
, now wait for the desired element, then access the element.
I've updated the code to take care of the stale element exception and add a loop of search terms.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
url = 'https://bonap.net/TDC/#'
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)
genera = ["Rudbeckia","Abelia","Abies"]
wait = WebDriverWait(driver, 10)
for genus in genera:
count = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#genus-panel div.taxon-panel-description > span")))
search = wait.until(EC.visibility_of_element_located((By.ID, "genus-filter")))
search.clear()
search.send_keys(genus)
wait.until(EC.staleness_of(count))
count = wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#genus-panel div.taxon-panel-description > span")))
print(f'genus <{genus}> count: {count.text}')
This outputs:
genus <Rudbeckia> count: 1
genus <Abelia> count: 1
genus <Abies> count: 1
Potential solution for the occasional stale element exception... turn
wait.until(EC.staleness_of(count))
into
try:
wait.until(EC.staleness_of(count))
wait.until(EC.staleness_of(count))
except TimeoutException:
# ignore and move on
and add the import
from selenium.common.exceptions import TimeoutException