I have an xml file with a default namespace, like this:
<?xml version="1.0" encoding="utf-8"?>
<root xmlns="somelongnamespace">
<child>...</child>
</root>
I starting using lxml
to iterate and query this file, but I would like to use a namespace prefix, like this:
from lxml import etree
xml = etree.parse("myfile.xml")
root = xml.getroot()
c = root.findall('ns:child')
What do I need to do for this to work? I cannot change the file, but I could change the xml object after loading.
I read the relevant lxml
documentation, searched and tried all kinds of suggestions, but got none of them to work unfortunately. This does sound like a very common question...?
You could map the ns
prefix to your default namespace by creating a Python dictionary.
from lxml import etree
xml = etree.parse("myfile.xml")
root = xml.getroot()
# a dictionary where the key "ns" is the prefix you want to use, and the value "somelongnamespace" is the namespace URI from your XML
namespace = {"ns": "somelongnamespace"}
children = root.findall('ns:child', namespaces=namespace)
if children:
print("Found children:")
for child in children:
print(f"{etree.tostring(child, encoding='unicode').strip()}")
else:
print("No children found.")
If you need to perform many operations on the XML, you can create a class that encapsulates the namespace and provides methods for querying the XML.
class XmlNamespaceHandler:
def __init__(self, xml_file, namespace):
self.xml = etree.parse(xml_file)
self.root = self.xml.getroot()
self.nsmap = {'ns': namespace}
def findall(self, xpath):
return self.root.findall(xpath, namespaces=self.nsmap)
xml_handler = XmlNamespaceHandler("myfile.xml", "somelongnamespace")
children = xml_handler.findall('ns:child')