import sys
import ruamel.yaml
yaml_str = """\
hello: world
foo: &core_foo
s: 1
"""
yaml_str2 = """\
hello1 : world
foo:
<<: *core_foo
"""
yaml = ruamel.yaml.YAML()
yaml.allow_duplicate_keys = True
yaml.dump(data, sys.stdout)
data = yaml.load(yaml_str + yaml_str2)
I tried to concatenate and read with allowing duplicate keys. While the result of load is as I expected, dump is not preserving the merge, and aliases
Expected:
hello: world
foo:
<<: *core_foo
hello1: world
Actual:
hello: world
foo:
s: 1
hello1: world
Is this how it is expected?
First of all it is unlikely that your program generates the output you show, because you set data
by loading the concatenated strings after you dump it.
I am also not sure why you concatenate the strings, but that might be a remnant from experimenting with the code.
The behaviour is as expected.
When allowing duplicate keys, ruamel.yaml
drops any recurring instances. Some other parsers don't check
for duplicate keys and silently overwrite the original entry (but by then will have the alias resolved, so the merged mapping data will probably be there).
In ruamel.yaml
the key-value pair foo
and the "merge", although they get parsed, are then dropped. This causes
the value for the first key foo
to have an anchor, but that value has only one reference.
The id (core_foo
) is attached to the data structure (as can be seen from the output of the code below)
During dump ruamel.yaml
tracks the nodes that are going to be dumped and if the same (Python) id
is encountered the
first occurence gets an anchor
and any following an alias. So essentially you need to wait until you can dump any node, until you know it
doesn't need an anchor (i.e. essentially walk over the data structure twice).
Since the seconds occurence of foo
gets discarded, there is no second reference to the data structure, and the initial occurence never needs an anchor.
You can easily check that behaviour by changing
foo
in your yaml_str2
to a key that doesn't occur in that mapping.
It is however possible to force dump a loaded anchor by setting its always_dump
attribute. There is no global option
on the YAML()
instance to do that, so you either need to know where the anchor is located or recursively walk the
data structure:
yaml_str = """\
hello: world
foo: &core_foo
s: 1
hello1 : world
foo:
<<: *core_foo
"""
yaml = ruamel.yaml.YAML()
yaml.allow_duplicate_keys = True
data = yaml.load(yaml_str)
print(data['foo'].anchor)
print('=' * 10)
yaml.dump(data, sys.stdout)
print('=' * 10)
def always_dump_anchors(d):
if isinstance(d, dict):
for k, v in d.items():
always_dump_anchors(k)
always_dump_anchors(v)
elif isinstance(d, list):
for elem in d:
always_dump_anchors(elem)
if hasattr(d, 'anchor'):
d.anchor.always_dump = True
always_dump_anchors(data)
yaml.dump(data, sys.stdout)
which gives:
Anchor('core_foo')
==========
hello: world
foo:
s: 1
hello1: world
==========
hello: world
foo: &core_foo
s: 1
hello1: world
Keeping track of id
s is necessary in any kind of data structure representation that might be self referencing.
Since it takes time some "dumpers", like
the json
package in the standard library allow you to speed things up by specifying your data structure is not self-referencing ( json.dump
does this by
providing check_circular=False
argument). Even your average __repr__
should do this, as became clear when ordereddict originally was added to Python 2:
it would crash on self-referential structures, (and that although the author of that change was aware of a test suite for ordereddict implementations that included tests for this)