pythonxmllxmlxml-namespaces

Iterate xml using namespace prefix


I have an xml file with a default namespace, like this:

<?xml version="1.0" encoding="utf-8"?>
<root xmlns="somelongnamespace">
  <child>...</child>
</root>

I starting using lxml to iterate and query this file, but I would like to use a namespace prefix, like this:

from lxml import etree
xml = etree.parse("myfile.xml")
root = xml.getroot()
c = root.findall('ns:child')

What do I need to do for this to work? I cannot change the file, but I could change the xml object after loading.

I read the relevant lxml documentation, searched and tried all kinds of suggestions, but got none of them to work unfortunately. This does sound like a very common question...?


Solution

  • You could map the ns prefix to your default namespace by creating a Python dictionary.

    from lxml import etree
    
    xml = etree.parse("myfile.xml")
    root = xml.getroot()
    
    # a dictionary where the key "ns" is the prefix you want to use, and the value "somelongnamespace" is the namespace URI from your XML
    namespace = {"ns": "somelongnamespace"}
    
    children = root.findall('ns:child', namespaces=namespace)
    if children:
        print("Found children:")
        for child in children:
            print(f"{etree.tostring(child, encoding='unicode').strip()}")
    else:
        print("No children found.")
    

    If you need to perform many operations on the XML, you can create a class that encapsulates the namespace and provides methods for querying the XML.

    class XmlNamespaceHandler:
        def __init__(self, xml_file, namespace):
            self.xml = etree.parse(xml_file)
            self.root = self.xml.getroot()
            self.nsmap = {'ns': namespace}
    
        def findall(self, xpath):
            return self.root.findall(xpath, namespaces=self.nsmap)
    
    
    xml_handler = XmlNamespaceHandler("myfile.xml", "somelongnamespace")
    children = xml_handler.findall('ns:child')