In the lxml.de website https://lxml.de/performance.html I see the following statement:
A way to improve the normal attribute access time is static instantiation of the Python objects, thus trading memory for speed. Just create a cache dictionary and run:
cache[root] = list(root.iter()) after parsing and:
del cache[root]
Can anyone provide me a suitable Python Code example about how these above mechanism can be used in a Python Function?
Setting a variable like cache[root] = list(root.iter())
will effectively cache objects in memory as demonstrated by a simple test.
The cache mechanism is very simple: the whole document tree is loaded in memory and elements can be obtained in different ways but point to the same memory address.
Given an XML document, get the id of an object before and after setting the cache. The id will be the same of the cache after setting it
from lxml import etree, objectify
otree = objectify.parse('tmp2.xml')
root = otree.getroot()
print(id(root.Form_1.Country), root.Form_1.Country)
cache = {}
cache[root] = list(otree.iter())
print(id(cache[root][3]), cache[root][3])
print(id(root.Form_1.Country), root.Form_1.Country)
# both point to the same object in memory
print(root.Form_1.Country is cache[root][3])
# the object can be obtained in different ways but point to the same object in the cache
ele1 = root.xpath('(//Form_1/Country)[1]')[0]
print(ele1 is cache[root][3])
Result
140257476833728 AFG
140257476833280 AFG
140257476833280 AFG
True
True
As explained in the link posted by the OP, it's trading memory for speed
A way to improve the normal attribute access time is static instantiation of the Python objects, thus trading memory for speed
Test XML
<Forms>
<greeting>Hello, world!</greeting>
<Form_1>
<Country>AFG</Country>
<Country>AFG</Country>
<Country>IND</Country>
</Form_1>
<Form_1>
<Country>IND</Country>
<Country>USA</Country>
</Form_1>
</Forms>