1

Given a YAML::Node how can we visit all Scalar Nodes within that node (to modify them)? My best guess is a recursive function:

void parseNode(YAML::Node node) {
    if (node.IsMap()) {
        for (YAML::iterator it = node.begin(); it != node.end(); ++it) {
            parseNode(it->second);
        }
    }
    else if (node.IsSequence()) {
        for (YAML::iterator it = node.begin(); it != node.end(); ++it) {
            parseNode(*it);
        }
    }
    else if (node.IsScalar()) {
        // Perform modifications here.
    }
}

This works fine for my needs except for in one case: if there is a recursive anchor/alias. (I think this is permitted by the YAML 1.2 spec? yaml-cpp certainly doesn't throw an Exception when parsing one) e.g.:

first: &firstanchor
  a: 5
  b: *firstanchor

The code above will follow the alias to the anchor repeatedly until crashing (SEGFAULT) presumably due to stack issues. How would one overcome this issue?

  • Is there a better way to iterate through an entire unknown Node structure that would solve this?
  • Is there a way to check from the Node with the public API if it is an alias so we can maintain a stack of previous visited aliases to detect looping and break.
  • Or is there a way to fetch some unique Node identifier so I can maintain a stack of previously visited nodes?
  • Or finally is this recursive anchor/alias just not spec-compliant - in which case how can I check for its occurrence to ensure I can return an appropriate error?

1 Answer 1

2

The most obvious way to identify nodes would be to use their m_pNode pointer; however that is not publicly queryable. The next best way is to use bool Node::is(const Node& rhs) const, which sadly doesn't allow us to do any sorting to speed up performance when searching for cycles.

As you probably do not only want to avoid cycles, but also want to avoid descending into the same subgraph twice, you actually need to remember all visited nodes. This means that eventually you'll store all nodes in the subgraph you walk and have space complexity O(n), and since there is no order on nodes we can use, time complexity is O(n²) since each node must be compared to all previously seen nodes. That is horrible.

Anyway, here's the code that does it:

class Visitor {
public:
  using Callback = std::function<void(const YAML::Node&)>;
  Visitor(Callback cb): seen(), cb(cb) {}

  void operator()(const YAML::Node &cur) {
    seen.push_back(cur);
    if (cur.IsMap()) {
      for (const auto &pair : cur) {
        descend(pair.second);
      }
    } else if (node.IsSequence()) {
      for (const auto &child : cur) {
        descend(child);
      }
    } else if (node.IsScalar()) {
      cb(cur);
    }
  }
private:
  void descend(const YAML::Node &target) {
    if (std::find(seen.begin(), seen.end(), target) != seen.end())
     (*this)(target);
  }

  std::vector<YAML::Node> seen;
  Callback cb;
};

(We can use std::find since operator== uses Node::is internally.)

And then you can do

Visitor([](const YAML::Node &scalar) {
  // do whatever with the scalar
})(node);

(forgive me for errors, my C++ is a bit rusty; I hope you get the idea.)

Sign up to request clarification or add additional context in comments.

4 Comments

Not bad! I had forgotten about Node::is. I'll reserve accepting for a couple of days to see if there are other solutions, but thank you!
I've realised we can do better than this. As node uses is for operator== we can use std::find(stack.begin(), stack.end(), target) to search the stack.
@SamBob Indeed, I incorporated this. You could also have a stack of std:pair<YAML:Node, YAML:iterator> instead if you want to get rid of recursion; you'd then have a loop that advances the top iterator, push any new unknown nodes it finds, and pops when the iterator is finished. I didn't write that code because it strays too far from the original question.
@SamBob I've come to the realization that you mustn't pop the stack if you not only want to avoid cycles, but also want to avoid visiting any node twice. I updated my answer for that, also changing the Visitor API to use operator() since the object should be created and used in one line and not re-used (since a new walk needs a new seen list).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.