I am using lxml with xpath to parse an epub3, xhtml content file.
I want to select all the li
nodes with the attribute epub:type="footnote"
as for example
<li epub:type="footnote" id="fn14"> ... </li>
I cannot find the right xpath expression for it.
The expression
//*[self::li][@id]
does select all the li
nodes with attribute id, but when I try
//*[self::li][@epub:type]
I get the error
lxml.etree.XPathEvalError: Undefined namespace prefix
The XML is
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
<meta charset="utf-8" />
<link rel="stylesheet" href="stylesheet.css" />
</head>
<body>
<section class="footnotes">
<hr />
<ol>
<li id="fn1" epub:type="footnote">
<p>See foo</p>
</li>
</ol>
</section>
</body>
</html>
Any suggestions on how to write the correct expression?
Have you declared the namespace prefix epub
to lxml?
>>> tree.getroot().xpath(
... "//li[@epub:type = 'footnote']",
... namespaces={'epub':'http://www.idpf.org/2007/ops'}
... )
The XHTML namespace is also tripping you up. Try:
>>> tree.getroot().xpath(
... "//xhtml:li[@epub:type = 'footnote']",
... namespaces={'epub':'http://www.idpf.org/2007/ops', 'xhtml': 'http://www.w3.org/1999/xhtml'}
... )