PyYAML can handle cyclic graphs in regular python objects. For example:
Snippet #1.
class Node: pass
a = Node()
b = Node()
a.child = b
b.child = a
# We now have the cycle a->b->a
serialized_object = yaml.dump(a)
object = yaml.load(serialized_object)
This code succeeds, so clearly there's some mechanism to prevent infinite recursion when loading the serialized object. How do I harness that when I write my own YAML constructor function?
For example, say Node
is a class with transient fields foo
and bar
, and intransient field child
. Only child
should make it into the yaml document. I would hope to do this:
Snippet #2.
def representer(dumper, node):
return dumper.represent_mapping("!node", {"child": node.child})
def constructor(loader, data):
result = Node()
mapping = loader.construct_mapping(data)
result.child = mapping["child"]
return result
yaml.add_representer(Node, representer)
yaml.add_constructor("!node", constructor)
# Retry object cycle a->b->a from earlier code snippet
serialized_object = yaml.dump(a)
print serialized_object
object = yaml.load(serialized_object)
But it fails:
&id001 !node
child: !node
child: *id001
yaml.constructor.ConstructorError: found unconstructable recursive node:
in "<string>", line 1, column 1:
&id001 !node
I see why. My constructor function isn't built for recursion. It needs to return the child object before it finishes constructing the parent object, and that fails when the child and parent are the same object.
But clearly PyYAML has graph traversals that solve this problem, because Snippet #1 works. Maybe there's one pass to construct all the objects and a second pass to populate their fields. My question is, how can my custom constructor tie into those mechanisms?
An answer to that question would be ideal. But if the answer is that I can't do this with custom constructors, and there is a less desirable alternative (e.g. mixing the YAMLObject
class into my Node
class), then that answer would be appreciated too.
For complex types, that might involve recursion (mapping/dict, sequence/list, objects), the constructor cannot create the object in one go. You should therefore yield
the constructed object in the constructor()
function, and then update any values after that¹:
def constructor(loader, data):
result = Node()
yield result
mapping = loader.construct_mapping(data)
result.child = mapping["child"]
that gets rid of the error.
¹ I don't think this is documented anywhere, without me looking at py/constructor.py
intensively, while upgrading PyYAML to ruamel.yaml, I would not have known how to do this. A typical case of: read the source Luke