[SOLVED] Proper way to extract XML elements from a namespace

Proper way to extract XML elements from a namespace

In a Python script I make a call to a SOAP service which returns an XML reply where the elements have a namespace prefix, let's say

<ns0:foo xmlns:ns0="SOME-URI">
  <ns0:bar>abc</ns0:bar>
</ns0:foo>

I can extract the content of ns0:bar with the method call

doc.getElementsByTagName('ns0:bar')

However, the name ns0 is only a local variable so to speak (it's not mentioned in the schema) and might as well have been named flubber or you_should_not_care. What is the proper way to extract the content of a namespaced element without relying on it having a specific name? In my case the prefix was indeed changed in the SOAP service which resulted in a parse failure.

Solution

Namespace support is needed if searching by element name

doc.getElementsByTagNameNS('SOME-URI','bar')

If using a package with namespace support like lxml

tree.findall('{http://schemas.xmlsoap.org/soap/envelope/}Body')

or by local name

   tree.xpath('//*[local-name()="bar"]'

lxml example

from lxml import etree
tree = etree.parse("/home/lmc/tmp/soap.xml")
tree.xpath('//*[local-name()="Company"]')

Result

[<Element {http://example.com}Company at 0x7f0959fb3fc0>]