c++xmlpugixml

How to get inner XML of a node in pugixml?


I parse a document and want to retrieve part of the XML tree as string. The document (example):

<?xml version="1.0"?>
<MyConfig>
    <MyData>
        <Foo bar="baz>42</Foo>
    </MyData>
    <OtherData>Something</OtherData>
</MyConfig>

The code:

  pugi::xml_document doc;
  doc.load_file(documentFileName);
  pugi::xml_node root = doc.child("MyConfig");

  // parse custom data
  _customData = root.child("MyData"). <-- HOW TO GET INNER XML?

Expected contents of custom data (if formatting is lost, I don't mind):

"<Foo bar="baz>42</Foo>"

How to do this?


Solution

  • I think pugi::xml_node::print() is a way.

    pugi::xml_node node = root.child("MyData");
    pugi::xml_node child = node.first_child();
    
    std::stringstream ss;
    child.print(ss);
    std::string s = ss.str();
    

    The trouble is that s will now have value

    <Foo bar="baz&gt;42&lt;/Foo&gt;     &lt;/MyData&gt;     &lt;OtherData&gt;Something&lt;/OtherData&gt; &gt; &#10;&lt;/MyConfig&gt;" />
    
    1. It's the textual tree from the node onwards, and;
    2. It's messy with html escape sequences rather than < and >

    Not ideal, but these can obviously be solved with some string manipulation.

    // replace &lt; with <
    size_t off = 0;
    while ((off = s.find("&lt;", off)) != s.npos)
      s.replace(off, 4, "<");
    
    // replace &gt; with >
    off = 0;
    while ((off = s.find("&gt;", off)) != s.npos)
      s.replace(off, 4, ">");
    
    // truncate at the closing tag
    size_t end_open = s.find(">", 0);
    size_t end_close = s.find(">", end_open + 1);
    s = s.substr(0, end_close + 1);
    

    Which will lead to s having value

    <Foo bar="baz>42</Foo>