Python: 3.11 Saxonche: 12.4.2
My website keeps consuming more and more memory until the server runs out of memory and crashes. I isolated the problematic code to the following script:
import gc
from time import sleep
from saxonche import PySaxonProcessor
xml_str = """
<root>
<stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
<stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
<stuff>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum ac auctor ex. Nunc in tincidunt urna. Sed tincidunt eros lacus, sed pulvinar sem venenatis et. Donec euismod orci quis pellentesque sagittis. Donec at tortor in dui mattis facilisis. Pellentesque vel varius lectus. Nunc sed gravida risus, ac finibus elit. Etiam sollicitudin nunc a velit efficitur molestie in ac lectus. Donec vulputate orci odio, sit amet hendrerit odio rhoncus commodo.</stuff>
</root>
"""
while True:
print('Running once...')
with PySaxonProcessor(license=False) as proc:
proc.parse_xml(xml_text=xml_str)
gc.collect()
sleep(1)
This script consumes memory at a rate of about 0.5 MB per second. The memory usage does not plateau after a while. I have logs showing that memory usage continues to grow for hours until the server runs out of memory and crashes.
Other things I tried that aren't shown above:
parse_xml()
using the del
Python keyword. No change.I have to use Saxon instead of lxml because I need XPath 3.0 support.
What am I doing wrong? How do I parse XML using Saxon in a way that doesn't leak?
A few folks have suggested that instantiating the PySaxonProcessor once before the loop will fix the leak. It doesn't. This still leaks:
with PySaxonProcessor(license=False) as proc:
while True:
print('Running once...')
proc.parse_xml(xml_text=xml_str)
gc.collect()
sleep(1)
It looks like a memory leak. I created a bug to track it: https://saxonica.plan.io/issues/6391
And the issue is now fixed in the released SaxonC 12.5.