xmlxpath

How to find the path to an XML element


I want to find the path to a given XML element (node). I have tried xmllint and xml_grep; both return the element(s) I'm searching for, but as far as I can tell, neither returns the path to that element.

Example XML:

<myroot>
   <my2ndlvl>
      <my3rdlvl foo="bar"/>
   </my2ndlvl>
   <my2ndlvl>
      <my3rdlvl fum="baz"/>
   </my2ndlvl>
</myroot>

I've tried a number of variants of xmllint --xpath '//my3rdlvl[@fum="baz"]' and xml_grep --cond '//my3rdlvl[@fum="baz"]', but both just return the node <my3rdlvl fum="baz"/> (xml_grep wraps the node in its own <xml_grep ...> node, but that's no help). What I want to get back is something like

myroot/my2ndlvl/my3rdlvl[@fum="baz"]

or an XML representation of that (without any nodes not on the xpath).

How can I find this path? Is there a way to make xmllint or xml_grep do it?


Solution

  • There are 2 tools1 that can produce xpath expressions from XML or HTML documents

    xml2xpath.sh

    # quiet (-q), absolute paths (-a), starting at expression (-s)
    xml2xpath.sh -q -a -s '//my3rdlvl[@fum="baz"]' ~/tmp/tmp2.xml 
    
    /myroot/my2ndlvl[2]/my3rdlvl
    /myroot/my2ndlvl[2]/my3rdlvl/@fum
    

    Test found expressions

    f='/home/lmc/tmp/tmp2.xml'
    for x in $(xml2xpath.sh -q -a -s '//my3rdlvl[@fum="baz"]' "$f" | grep -v '^$');do
        xmllint --xpath "$x" "$f"
    done
    

    Result:

    <my3rdlvl fum="baz"/>
     fum="baz"
    

    pyxml2xpath

    # pyxml2xpath <file path> [mode] [initial xpath expression] [with element count: yes|true] [max elements: int] [no banner: yes|true]
    pyxml2xpath ~/tmp/tmp2.xml xpath '//my3rdlvl[@fum="baz"]' false 10 true
    
    /myroot/my2ndlvl[2]/my3rdlvl
    

    [1] Disclaimer: I'm the author of both tools.