I am currently using the xml.etree
Python library to parse HTML.
After finding a target DOM element, I am attempting to extract its text. Unfortunately, it seems that the .text
attribute is severely limited in its functionality and will only return the immediate inner text of an element (and not anything nested). Do I really have to loop through all the children of the ElementTree
? Or is there a more elegant solution?
The descendant
XPath axis should return descendant nodes (including whitespaces)
For example:
//body/descendant::text()
or
//body/descendant::*/text()
As a generic case
//xpath/to/target/element/descendant::text()