I have the following yaml snippet
I want to resolve the pointer to the anchor and also I don't want to lose !flatten
and !ref
which are to be processed by another program.
Input:
_ip_context: &ip_context
ip_restriction: !flatten
- !ref 'constant::public_cidr_blocks'
policies:
- file: policies/somepolicy.json
context:
<<: *ip_context
Desired output:
policies:
- file: policies/somepolicy.json
context:
ip_restriction: !flatten
- !ref 'constant::public_cidr_blocks'
I tried this program which is produced by ChatGpt. But it didn't get me what I wanted:
import sys
import yaml
yaml_content = """
_ip_context: &ip_context
ip_restriction: !flatten
- !ref 'constant::public_cidr_blocks'
policies:
- file: policies/somepolicy.json
context:
<<: *ip_context
"""
class FlattenConstructor(yaml.constructor.SafeConstructor):
def construct_flatten(self, node):
return self.construct_sequence(node)
class RefConstructor(yaml.constructor.SafeConstructor):
def construct_ref(self, node):
return self.construct_scalar(node)
yaml.add_constructor('!flatten', FlattenConstructor.construct_flatten, Loader=yaml.SafeLoader)
yaml.add_constructor('!ref', RefConstructor.construct_ref, Loader=yaml.SafeLoader)
data = yaml.load(yaml_content, Loader=yaml.SafeLoader)
class FlattenRepresenter(yaml.representer.SafeRepresenter):
def represent_flatten(self, data):
return self.represent_sequence('!flatten', data)
class RefRepresenter(yaml.representer.SafeRepresenter):
def represent_ref(self, data):
return self.represent_scalar('!ref', data)
yaml.add_representer(list, FlattenRepresenter.represent_flatten)
yaml.add_representer(str, RefRepresenter.represent_ref)
#with open('output.yaml', 'w') as outfile:
yaml.dump(data, sys.stdout, default_flow_style=False,Dumper=yaml.SafeDumper)
This is the output:
_ip_context:
ip_restriction: &id001
- constant::public_cidr_blocks
policies:
- context:
ip_restriction: *id001
file: policies/somepolicy.json
I asked generative "AI" programs some questions about python and YAML (about which I imagine I know a thing or two), and had a good laugh at the answers that it gave.
The code doesn't create different types for tagged and non-tagged sequences and scalars. So the output would have had tags attached to all of them, if the representer code would have worked. The code also fails to do anything to prevent the anchor and aliases from being created.
Removing aliases without removing the anchor
is described here.
In your case things are on the one hand simpler, as you just can remove the part of the loaded data structure that you don't want,
to get rid of the anchors/and aliases.
ruamel.yaml
will preserve the tags for you, without having to anything special, but it will also preserve the merge key, which
is not what you want. To get rid of that you could update the representer, but that would require dupclicating a rather large
piece of code from the method represent_mapping
. So my preference is to just recursively walk over the data structure and
getting rid of the merge information (which is equally dependent on ruamel.yaml
internals, so pin the version you use):
import sys
import pathlib
import ruamel.yaml
file_name = Path('input.yaml')
def un_merge(d):
if isinstance(d, dict):
if d.merge:
for kvs in d.merge:
for k1, v1 in kvs[1].items(): # kvs[0] is the position of the merge
d[k1] = v1
delattr(d, ruamel.yaml.comments.merge_attrib)
for k, v in d.items():
un_merge(k)
un_merge(v)
elif isinstance(d, list):
for elem in d:
un_merge(elem)
yaml = ruamel.yaml.YAML()
yaml.preserve_quotes = True
data = yaml.load(file_name)
del data['_ip_context']
un_merge(data)
yaml.dump(data, sys.stdout)
which gives:
policies:
- file: policies/somepolicy.json
context:
ip_restriction: !flatten
- !ref 'constant::public_cidr_blocks'
and that looks like your desired output.
By default ruamel.yaml
removes superfluous quotes, and the quotes around constant::public_cidr_blocks
are not necessary for parsers correctly handing colons within scalars (not all do). However within tagged
scalars they are preserved, regardless of preserve_quotes
. Only comment it out if you have untagged
scalars with superfluous quotes.
The order of the mapping keys is preserved (it wasn't in the output you got).
If there had been comments on the anchor part of the original mapping, these would not have been "moved" automagically.