pythonxmllxmlelementtreecelementtree

Parse XML with lxml, and then manipulate it with cElementTree


I have an app which constantly reloads a large amount of XML data from a file, and then performs manipulations, and then writes back to file.

The lxml library is proven much faster for parsing and un-parsing XML, but cElementTree is much faster for certain kinds of manipulation. Both have an almost identical API.

How can I parse an XML file with lxml, and then manipulate it with cElementTree?

This is what I've tried, but the object produced by lxml parse methods inherently use it's own manipulative methods.

import xml.etree.cElementTree as ET
from lxml import etree as lxmlET

Solution

  • This question is perhaps the Python equivalent of "My friend has a fast car and I just have a clunker. How can I make my car go as fast as hers?"

    I'm not saying this couldn't be done, but I should call call such an enterprise either ambitious or foolhardy, depending on your level of programming skill. The point is that each system has, as you have discovered, its own internal representation of the parsed XML.

    While it might be possible to write code to take the parsed object produced by lxml and re-create or wrap it as ElementTree elements, it's probably going to a) take as long as parsing with ElementTree in the first place, and b) be a maintenance nightmare.

    So do yourself a favor and choose one technology then stick with it (at least for each individual program).

    I would also point out that XML was intended primarily as a data interchange language. The fact that you seem to be using it as a structured data repository inevitably introduces large inefficiencies in the processing, particularly as data volumes go up. Might it be better to choose some more amenable representation and then only convert it to XML for output and usage by other systems?