c++yamlyaml-cpp

Iterate Through All Nodes In yaml-cpp Including Recursive Anchor/Alias


Given a YAML::Node how can we visit all Scalar Nodes within that node (to modify them)? My best guess is a recursive function:

void parseNode(YAML::Node node) {
    if (node.IsMap()) {
        for (YAML::iterator it = node.begin(); it != node.end(); ++it) {
            parseNode(it->second);
        }
    }
    else if (node.IsSequence()) {
        for (YAML::iterator it = node.begin(); it != node.end(); ++it) {
            parseNode(*it);
        }
    }
    else if (node.IsScalar()) {
        // Perform modifications here.
    }
}

This works fine for my needs except for in one case: if there is a recursive anchor/alias. (I think this is permitted by the YAML 1.2 spec? yaml-cpp certainly doesn't throw an Exception when parsing one) e.g.:

first: &firstanchor
  a: 5
  b: *firstanchor

The code above will follow the alias to the anchor repeatedly until crashing (SEGFAULT) presumably due to stack issues. How would one overcome this issue?


Solution

  • The most obvious way to identify nodes would be to use their m_pNode pointer; however that is not publicly queryable. The next best way is to use bool Node::is(const Node& rhs) const, which sadly doesn't allow us to do any sorting to speed up performance when searching for cycles.

    As you probably do not only want to avoid cycles, but also want to avoid descending into the same subgraph twice, you actually need to remember all visited nodes. This means that eventually you'll store all nodes in the subgraph you walk and have space complexity O(n), and since there is no order on nodes we can use, time complexity is O(n²) since each node must be compared to all previously seen nodes. That is horrible.

    Anyway, here's the code that does it:

    class Visitor {
    public:
      using Callback = std::function<void(const YAML::Node&)>;
      Visitor(Callback cb): seen(), cb(cb) {}
    
      void operator()(const YAML::Node &cur) {
        seen.push_back(cur);
        if (cur.IsMap()) {
          for (const auto &pair : cur) {
            descend(pair.second);
          }
        } else if (node.IsSequence()) {
          for (const auto &child : cur) {
            descend(child);
          }
        } else if (node.IsScalar()) {
          cb(cur);
        }
      }
    private:
      void descend(const YAML::Node &target) {
        if (std::find(seen.begin(), seen.end(), target) != seen.end())
         (*this)(target);
      }
    
      std::vector<YAML::Node> seen;
      Callback cb;
    };
    

    (We can use std::find since operator== uses Node::is internally.)

    And then you can do

    Visitor([](const YAML::Node &scalar) {
      // do whatever with the scalar
    })(node);
    

    (forgive me for errors, my C++ is a bit rusty; I hope you get the idea.)