<div class = "card-block cms>
<p>and then have a tea or coffee on the balcony of the cafeteria.</p>
<p> </p>
</div>
I am trying to check if the text I crawl of a website contains
texts = driver.find_element_by_xpath("//div[@class='card-block cms']")
textInDivTag = texts.text
print(textInDivTag)
if u"\xa0" in textInDivTag:
print("yes")
My output is as follows:
and then have a tea or coffee on the balcony of the cafeteria.
As you can see, it doesn't recognize the non-breaking space.
The character is recognized, but it is being converted to a normal space (u"\x20"
).
According to the comment in the Java Selenium sourcecode, .text
/ .getText()
returns the visible text, and references the W3C webdriver specification, section "11.3.5 Get Element Text" (emphasis added by me):
The Get Element Text command intends to return an element’s text “as rendered”. An element’s rendered text is also used for locating a elements by their link text and partial link text.
One of the major inputs to this specification was the open source Selenium project. This was in wide-spread use before this specification written, and so had set user expectations of how the Get Element Text command should work. As such, the approach presented here is known to be flawed, but provides the best compatibility with existing users.
So probably, this behavior is according to the specification, but I couldn't yet find the source code specifically replacing non-breaking spaces by regular whitespace. I could also not find an issue in the Selenium repository, but maybe you can give it a try by opening one.