[SOLVED] How to Cache Elements to increase the Runtime Performance with lxml Pythin Library

How to Cache Elements to increase the Runtime Performance with lxml Pythin Library

In the lxml.de website https://lxml.de/performance.html I see the following statement:

A way to improve the normal attribute access time is static instantiation of the Python objects, thus trading memory for speed. Just create a cache dictionary and run:

cache[root] = list(root.iter()) after parsing and:

del cache[root]

Can anyone provide me a suitable Python Code example about how these above mechanism can be used in a Python Function?

Solution

Setting a variable like cache[root] = list(root.iter()) will effectively cache objects in memory as demonstrated by a simple test.
The cache mechanism is very simple: the whole document tree is loaded in memory and elements can be obtained in different ways but point to the same memory address.

Given an XML document, get the id of an object before and after setting the cache. The id will be the same of the cache after setting it

from lxml import etree, objectify
otree = objectify.parse('tmp2.xml')
root = otree.getroot()
print(id(root.Form_1.Country), root.Form_1.Country)

cache = {}
cache[root] = list(otree.iter())
print(id(cache[root][3]), cache[root][3])
print(id(root.Form_1.Country), root.Form_1.Country)

# both point to the same object in memory
print(root.Form_1.Country is cache[root][3])

# the object can be obtained in different ways but point to the same object in the cache
ele1 = root.xpath('(//Form_1/Country)[1]')[0]

print(ele1 is cache[root][3])

Result

140257476833728 AFG

140257476833280 AFG
140257476833280 AFG

True
True

As explained in the link posted by the OP, it's trading memory for speed

A way to improve the normal attribute access time is static instantiation of the Python objects, thus trading memory for speed

Test XML

<Forms>
    <greeting>Hello, world!</greeting>
    <Form_1>
        <Country>AFG</Country>
        <Country>AFG</Country>
        <Country>IND</Country>
    </Form_1>
    <Form_1>
        <Country>IND</Country>
        <Country>USA</Country>
    </Form_1>
</Forms>