I'm having trouble understanding Xerces-C++ memory management.
If I have this (example) XML file "config.xml":
<?xml version="1.0" encoding="UTF-8"?>
<settings>
<port>
<reference>Ref1</reference>
<label>1PPS A</label>
<enabled>true</enabled>
</port>
</settings>
and this code:
#include <xercesc/dom/DOM.hpp>
XERCES_CPP_NAMESPACE_USE
DOMElement *nextChildElement(const DOMElement *parent)
{
DOMNode *node = (DOMNode *)parent->getFirstChild();
while (node)
{
if (node->getNodeType() == DOMNode::ELEMENT_NODE)
return (DOMElement *)node;
node = node->getNextSibling();
}
return nullptr;
}
int main(int argc, char **argv)
{
XMLPlatformUtils::Initialize();
XMLCh tempStr[100];
XMLString::transcode("LS", tempStr, 99);
DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(tempStr);
DOMLSParser *parser = ((DOMImplementationLS*)impl)->createLSParser(DOMImplementationLS::MODE_SYNCHRONOUS, 0);
DOMDocument *doc = impl->createDocument(0, 0, 0);
doc = parser->parseURI("config.xml");
DOMElement *el = doc->getDocumentElement(); // <settings>
el = nextChildElement(el); // <port>
el = nextChildElement(el); // <reference>Ref1</reference>
// Heap blows up here
while (1) {
char *cstr = XMLString::transcode(el->getTextContent());
XMLString::release(&cstr); // cstr is "Ref1"
}
// and/or here
while (1) {
XMLCh *xstr = XMLString::replicate(el->getTextContent());
char *cstr = XMLString::transcode(xstr); // cstr is "Ref1"
XMLString::release(&cstr);
XMLString::release(&xstr);
}
}
Why does the program (heap) memory blow up in the while (1)
loops. Either loop results in the same memory problem:
Note: I'm using Visual Studio 2017, and I've tested this in these configurations (all with same results):
The problem is that function const XMLCh *getTextConent()
allocates memory on the Document's heap (using its MemoryManager), and there is no provision to allow the caller to deallocate the memory, or mark it for recycling. So, once the returned pointer is removed from the caller's stack, the memory is essentially orphaned until the entire Document is released, at which time the MemoryManager deletes all heap allocations.
The solution is to not use getTextContent()
, but use getNodeValue()
instead, which returns a pointer to the data, rather than reallocating it off an internal heap.
That aside, getTextContent does not work anyway. It's buggy as all get out and is effectively useless. You can't read the DOM that way or you'll get inaccurate data back under a variety of different circumstances if there are non-adjacent Text nodes (and if there aren't, you don't need to use it anyway since the direct node value will be all you need).
So, a working version of the OP example code might look like this:
#include <xercesc/dom/DOM.hpp>
#include <string>
XERCES_CPP_NAMESPACE_USE
DOMElement *nextChildElement(const DOMElement *parent)
{
DOMNode *node = (DOMNode *)parent->getFirstChild();
while (node)
{
if (node->getNodeType() == DOMNode::ELEMENT_NODE)
return (DOMElement *)node;
node = node->getNextSibling();
}
return nullptr;
}
std::string readTextNode(const DOMElement *el)
{
std::string sstr;
DOMNode *node = el->getFirstChild();
if (node->getNodeType() == DOMNode::TEXT_NODE) {
char *cstr = XMLString::transcode(node->getNodeValue());
sstr = cstr;
XMLString::release(&cstr);
}
return sstr;
}
int main(int argc, char **argv)
{
XMLPlatformUtils::Initialize();
XMLCh tempStr[100];
XMLString::transcode("LS", tempStr, 99);
DOMImplementation *impl = DOMImplementationRegistry::getDOMImplementation(tempStr);
DOMLSParser *parser = ((DOMImplementationLS*)impl)->createLSParser(DOMImplementationLS::MODE_SYNCHRONOUS, 0);
DOMDocument *doc = impl->createDocument(0, 0, 0);
doc = parser->parseURI("config.xml");
DOMElement *el = doc->getDocumentElement(); // <settings>
el = nextChildElement(el); // <port>
el = nextChildElement(el); // <reference>Ref1</reference>
// No memory leak
std::string nodestr;
while (1) {
nodestr = readTextNode(el); // nodestr is "Ref1"
}
}