What I need is just get the text of the corresponding tag and persist it into database. Since the xml file is big (4.5GB) I'm using sax. I used the characters method to get the text and put it in a dictionary. However when I'm printing the text at the endElement method I'm getting a new line instead of the text.
Here is my code:
def characters(self,content):
text = unescape(content))
self.map[self.tag]=text
def startElement(self, name, attrs):
self.tag = name
def endElement (self, name)
if (name=="sometag")
print self.map[name]
Thanks in advance.
The text in the tag is chunked by the SAX processor. characters
might be called multiple times.
You need to do something like:
def startElement(self, name, attrs):
self.map[name] = ''
self.tag = name
def characters(self, content):
self.map[self.tag] += content
def endElement(self, name):
print self.map[name]