pythonxmlxml-parsing

Proper way to extract XML elements from a namespace


In a Python script I make a call to a SOAP service which returns an XML reply where the elements have a namespace prefix, let's say

<ns0:foo xmlns:ns0="SOME-URI">
  <ns0:bar>abc</ns0:bar>
</ns0:foo>

I can extract the content of ns0:bar with the method call

doc.getElementsByTagName('ns0:bar')

However, the name ns0 is only a local variable so to speak (it's not mentioned in the schema) and might as well have been named flubber or you_should_not_care. What is the proper way to extract the content of a namespaced element without relying on it having a specific name? In my case the prefix was indeed changed in the SOAP service which resulted in a parse failure.


Solution

  • Namespace support is needed if searching by element name

    doc.getElementsByTagNameNS('SOME-URI','bar')
    

    If using a package with namespace support like lxml

    tree.findall('{http://schemas.xmlsoap.org/soap/envelope/}Body')
    

    or by local name

       tree.xpath('//*[local-name()="bar"]'
    

    lxml example

    from lxml import etree
    tree = etree.parse("/home/lmc/tmp/soap.xml")
    tree.xpath('//*[local-name()="Company"]')
    

    Result

    [<Element {http://example.com}Company at 0x7f0959fb3fc0>]