pythonlxml

What is the difference between xpath() and findall()?


Very often I see that calls to xpath could as well be replaced by calls to findall, when can this be done? What is the main differences between the two functions?

  1. The first argument to path findall is a path, while to xpath the first argument _path is an xpath.

lxml docs for findall(): https://lxml.de/apidoc/lxml.etree.html#lxml.etree._Element.findall

findall(self, path, namespaces=None):
"""Finds all matching subelements, by tag name or path.

The optional namespaces argument accepts a prefix-to-namespace mapping 
that allows the usage of XPath prefixes in the path expression."""

lxml docs for xpath(): https://lxml.de/apidoc/lxml.etree.html#lxml.etree.XPath

xpath(self, _path, namespaces=None, extensions=None, smart_strings=True, **_variables)
"""Evaluate an xpath expression using the element as context node."""

However most of the arguments are not documented what they do. And a non-listed argument error_log is supplied with and empty description.

This seems to be the specification of an xpath: https://www.w3.org/TR/xpath-31/

But what is the path object supplied to findall?

The python package has this to say about xpath support in xml.etree.elementtree (ElementTree is not the same as the lxml package mentioned above, see What are the differences between lxml and ElementTree?), but is the limited xpath related? https://docs.python.org/3.13/library/xml.etree.elementtree.html#xpath-support


Solution

  • From lxml xpath support docs

    lxml.etree supports the simple path syntax of the find, findall and findtext methods on ElementTree and Element, as known from the original ElementTree library (ElementPath). As an lxml specific extension, these classes also provide an xpath() method that supports expressions in the complete XPath syntax, as well as custom extension functions.

    About ElementPath mentioned above from lxml.etree Tutorial

    ElementPath

    The ElementTree library comes with a simple XPath-like path language called ElementPath. The main difference is that you can use the {namespace}tag notation in ElementPath expressions. However, advanced features like value comparison and functions are not available.

    Furthermore from xml.etree.ElementTree docs mentioned in the above paragraph it can seen what simple path syntax refers to

    Element.findall() finds only elements with a tag which are direct children of the current element.
    Element.find() finds the first child with a particular tag, and Element.text accesses the element’s text content.

    So find*() (x)path support is limited to direct children of the context node while xpath() uses a full blown xpath engine.

    See @furas answer for test cases of the above.