pythonconstructorpyyamlgraph-traversalcyclic-dependency

How do I handle recursion in a custom PyYAML constructor?


PyYAML can handle cyclic graphs in regular python objects. For example:

Snippet #1.

class Node: pass
a = Node()
b = Node()
a.child = b
b.child = a
# We now have the cycle a->b->a
serialized_object  = yaml.dump(a)
object = yaml.load(serialized_object)

This code succeeds, so clearly there's some mechanism to prevent infinite recursion when loading the serialized object. How do I harness that when I write my own YAML constructor function?

For example, say Node is a class with transient fields foo and bar, and intransient field child. Only child should make it into the yaml document. I would hope to do this:

Snippet #2.

def representer(dumper, node):
  return dumper.represent_mapping("!node", {"child": node.child})

def constructor(loader, data):
  result = Node()
  mapping = loader.construct_mapping(data)
  result.child = mapping["child"]
  return result

yaml.add_representer(Node, representer)
yaml.add_constructor("!node", constructor)

# Retry object cycle a->b->a from earlier code snippet
serialized_object  = yaml.dump(a)
print serialized_object
object = yaml.load(serialized_object)

But it fails:

&id001 !node
child: !node
  child: *id001

yaml.constructor.ConstructorError: found unconstructable recursive node:
  in "<string>", line 1, column 1:
    &id001 !node

I see why. My constructor function isn't built for recursion. It needs to return the child object before it finishes constructing the parent object, and that fails when the child and parent are the same object.

But clearly PyYAML has graph traversals that solve this problem, because Snippet #1 works. Maybe there's one pass to construct all the objects and a second pass to populate their fields. My question is, how can my custom constructor tie into those mechanisms?

An answer to that question would be ideal. But if the answer is that I can't do this with custom constructors, and there is a less desirable alternative (e.g. mixing the YAMLObject class into my Node class), then that answer would be appreciated too.


Solution

  • For complex types, that might involve recursion (mapping/dict, sequence/list, objects), the constructor cannot create the object in one go. You should therefore yield the constructed object in the constructor() function, and then update any values after that¹:

    def constructor(loader, data):
        result = Node()
        yield result
        mapping = loader.construct_mapping(data)
        result.child = mapping["child"]
    

    that gets rid of the error.

    ¹ I don't think this is documented anywhere, without me looking at py/constructor.py intensively, while upgrading PyYAML to ruamel.yaml, I would not have known how to do this. A typical case of: read the source Luke