I am using a SAX parser (xml.sax
) and it works how I want to. However, I am parsing quite a large file (hence why I use SAX) and I would like to stop parsing at some point (e.g., when I reached a certain limit, or when I found a certain piece of data).
class ProductHandler(xml.sax.ContentHandler):
def startElement(self, tag, attrs):
[.. process start ..]
def endElement(self, tag):
[.. process end ..]
def characters(self, content):
[.. process characters ..]
product_handler = ProductHandler()
parser = xml.sax.make_parser()
parser.setContentHandler(product_handler)
parser.parse(xmlfile)
How do I do that? Is there a certain return value I can return at one of the handler methods? I checked the documentation, but I couldn't find anything in this direction.
Using this example data, if we want to find a <description>
that contains the word "sourdough", maybe we would write something like this:
import xml.sax
class IAmAllDone(Exception):
pass
class ProductHandler(xml.sax.handler.ContentHandler):
def __init__(self):
super().__init__()
self.description = None
self.name = None
self.tree = []
def startElement(self, name, attrs):
self.tree.append(name)
def endElement(self, name):
self.tree.pop(0)
def characters(self, content):
if self.tree[-1] == "name" and content.strip():
self.name == content
print("name:", content)
elif self.tree[-1] == "description" and "sourdough" in content:
self.description = content
raise IAmAllDone()
product_handler = ProductHandler()
parser = xml.sax.make_parser()
parser.setContentHandler(product_handler)
try:
parser.parse("data.xml")
except IAmAllDone:
pass
if product_handler.description is not None:
print("found description:", product_handler.description)
The above will output:
name: Belgian Waffles
name: Strawberry Belgian Waffles
name: Berry-Berry Belgian Waffles
name: French Toast
found description: Thick slices made from our homemade sourdough bread
As you can see, we stop the SAX parsing before reading the final "Homestyle Breakfast" item.