c++xmlpugixml

pugixml - get all text nodes (PCDATA), not just the first


Currently, if I try to parse

<parent>
    First bit of text
    <child>
    </child>
    Second bit of text
</parent>

I only get First bit of text with

parent.text().get()

What's the correct way to grab all text nodes in parent?

  1. Is there a nice utility function for this?
  2. How could it be done iterating though all children?

Solution

  • There is no function that concatenates all text; if you want to get a list of text node children, you have two options:

    1. XPath query:

       pugi::xpath_node_set ns = parent.select_nodes("text()");
      
       for (size_t i = 0; i < ns.size(); ++i)
           std::cout << ns[i].node().value() << std::endl;
      
    2. Manual iteration w/type checking:

       for (pugi::xml_node child = parent.first_child(); child; child = child.next_sibling())
           if (child.type() == pugi::node_pcdata)
               std::cout << child.value() << std::endl;
      

    Note that if you can use C++11 then the second option can be much more concise:

    for (pugi::xml_node child: parent.children())
        if (child.type() == pugi::node_pcdata)
            std::cout << child.value() << std::endl;
    

    (of course, you can also use ranged for to iterate through xpath_node_set)