How does one go about getting the text from the nodes and subnodes in TinyXML2?
The XMLPrinter class seems to do what I need, but it does not print the text properly.
My XML:
<div>The quick brown <b>fox</b> jumps over the <i>lazy</i> dog.</div>
My class which extends the XMLPrinter class:
class XMLTextPrinter : public XMLPrinter {
virtual bool VisitEnter (const XMLDocument &) { return true; }
virtual bool VisitExit (const XMLDocument &) { return true; }
virtual bool VisitEnter (const XMLElement &e, const XMLAttribute *) {
auto text = e.GetText();
if(text) {
std::cout << text;
}
return true;
}
virtual bool VisitExit (const XMLElement &e) { return true; }
virtual bool Visit (const XMLDeclaration &) { return true; }
virtual bool Visit (const XMLText &e) { return true; }
virtual bool Visit (const XMLComment &) { return true; }
virtual bool Visit (const XMLUnknown &) { return true; }
};
My code:
XMLDocument document;
document.Parse(..., ...);
auto elem = ...;
XMLTextPrinter printer;
elem->Accept(&printer);
The output:
The quick brown foxlazy
Why is it ignoring all text which come after the <b>
and <i>
elements? How can I solve this? Also, the XMLPrinter class properly prints it out with the tags, but I do not want the tags.
[Edited 14-Apr-17 to improve (I hope).]
XMLPrinter
derives from XMLVisitor
and prints the XML document (or element) in full, tags, attributes and all. XMLVisitor
does the work of recursing up and down the XML hierarchy, calling default, do nothing, implementations of methods VisitEnter
/VisitExit
for nodes that can have descendants (children), i.e. documents and elements and ``Visit` for leaf nodes, i.e. text, comments etc. Override these methods in a derived class to implement the desired functionality.
The first problem is that you are modifying XMLPrinter
. This derives from XMLVisitor
and creates a printable representation of the XML document. But then you replace all XMLPrinter
's visit... methods with your own. It would be much better, and less work, to derive from XMLVisitor
directly.
Secondly, you're getting the element text from VisitEnter
alone using GetText()
which will not work when child nodes are embedded in it as documented here.
In this case, to get only the text of all elements override Visit
for the text leaf nodes, i.e. Visit(const XMLText &)
.
#include "tinyxml2.h"
#include <iostream>
using namespace tinyxml2;
class XMLPrintText : public XMLVisitor
{
public:
virtual bool Visit (const XMLText & txt) override
{
std::cout << txt .Value();
return true;
}
};
int main()
{
XMLDocument doc;
doc.Parse ("<div>The quick brown <b>fox</b> jumps over the <i>lazy</i> dog.</div>");
auto div = doc .FirstChildElement();
XMLPrintText prt;
div -> Accept (&prt);
return 0;
}