I would like to set a template for the key order and line spacing of a yaml file and apply this to the repository of 100s of yaml files I have. In general, I want like to do the following:
I am using python 3.10 and ruamel.yaml version. At a very basic level, I understand that the YAML object in ruamel.yaml is based upon an ordered dictionary and the accepted answer here seems like a simple way to ensure a specific order of a dictionary's keys, but I don't know how to apply that to the YAML object.
To maintain comments, I presume that the .ca
attribute can be copied although I don't know how to then apply the line spacing rules from the template.
Further complicating the matter is that some keys themselves may have multiple values (I think these would be a CommentedSequence
in ruamel.yaml ?) each of which should follow the templated order - and the last one will need a blank line after it.
Here is a basic version of the template that should provide an overview of the structure I'm talking about:
template='''
name:
region:
origin:
description:
go_live_date:
status:
governance:
business_owner:
am:
eu:
ap:
technical_owner:
am:
eu:
ap:
architecture:
protocol:
platform:
environments:
- name:
description:
tier:
locations:
'''
In the following example, the key order is wrong, there are missing and double line spacing plus some comments:
'''
name: MyApp
description: My wonderful application
origin: internal
governance:
technical_owner:
am:
- Nico Ferrell
ap:
- Benedict Berger
- Elsie Parsons
eu:
- Frances Case
business_owner:
eu:
- Audrey Dalton
am:
- John Carpenter # to be updated
architecture:
protocol: [TCP]
platforms: [python_3_10, java_16]
status: in production
go_live_date: 2024-01-01
environments:
- name: EU Prod
description: production environment for EMEA
tier: production
locations: [ABC, XYZ]
- name: EU UAT
description: UAT environment for EMEA
locations: [LMN]
tier: uat
# further environmental details to be added
'''
After applying the template and steps outlined to this example, the resultant file should look like this:
'''
name: MyApp
origin: internal
description: My wonderful application
status: in production
go_live_date: 2024-01-01
governance:
technical_owner:
am:
- Nico Ferrell
eu:
- Frances Case
ap:
- Benedict Berger
- Elsie Parsons
business_owner:
am:
- John Carpenter # to be updated
eu:
- Audrey Dalton
architecture:
protocol: [TCP]
platforms: [python_3_10, java_16]
environments:
- name: EU Prod
description: production environment for EMEA
tier: production
locations: [ABC, XYZ]
- name: EU UAT
description: UAT environment for EMEA
tier: uat
locations: [LMN]
# further environmental details to be added
'''
I don't know how to tackle this and would appreciate some help
You tackle this by writing a program, using order of the keys to re-insert the keys of the example in the order of the template.
You can either use the .insert()
methode that is available on the CommentedMap()
instance that is
used to load a YAML mapping inserting at position 0 using the reverse key order from the template. But you
can also use the normal key order and pop and assign, that will get the first key at the back, then followed
by the others until the first key is at the front.
To execute that, you can use a function that keeps a path to find corresponding the corresponding data strucure in the example, or recurse in parallel.
import sys
from pathlib import Path
import ruamel.yaml
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.preserve_quotes = True
template = yaml.load(Path('template.yaml'))
data = yaml.load(Path('example.yaml'))
def reorder(t, d):
if isinstance(t, dict):
for k, v in t.items():
try:
dv = d.pop(k)
except:
# this handles e.g. the key 'region' that is missing from the example
continue
d[k] = dv
reorder(v, dv)
elif isinstance(t, list):
# assume the template has one element, the example multiple
for idx, elem in enumerate(d):
reorder(t[0], elem)
reorder(template, data)
yaml.dump(data, sys.stdout)
which gives:
name: MyApp
origin: internal
description: My wonderful application
go_live_date: 2024-01-01
status: in production
governance:
business_owner:
am:
- John Carpenter # to be updated
eu:
- Audrey Dalton
technical_owner:
am:
- Nico Ferrell
eu:
- Frances Case
ap:
- Benedict Berger
- Elsie Parsons
architecture:
platforms: [python_3_10, java_16]
protocol: [TCP]
environments:
- name: EU Prod
description: production environment for EMEA
tier: production
locations: [ABC, XYZ]
- name: EU UAT
description: UAT environment for EMEA
tier: uat
# further environmental details to be added
locations: [LMN]
This gets your keys in the order of the template, but doesn't handle the empty lines.
That
is on purpose as we only recurse into the template data structure and the newline after
"John Carpenter" is part of the sequence that is not part of the template.
(As you can check with print(data['governance']['business_owner']['am'].ca)
)
Because of the way ruamel.yaml currently processes comments, attaching them to the last fully parsed node, the comment # further..
is assoicated with the key 'tier', and properly shifts position with reordering (although that might not be what you want).
Since ruamel.yaml
was concieved to update values in existing YAML (config) files
preserving as much as possible (key order, comments, empty lines) and you are
certainly not doing anything close to that, you'll have some work doing the other steps.
I would first walk over the resuling example data an print the comments you find:
def remove_empty_lines(d):
if isinstance(d, dict):
for k, v in d.items():
if d.ca.comment:
print('comment', d.ca.comment)
if (itemc := d.ca.items.get(k)) is not None:
print('itemc', v, itemc)
remove_empty_lines(v)
elif isinstance(d, list):
for idx, elem in enumerate(d):
if d.ca.comment:
print('lcomment', d.ca.comment)
if (itemc := d.ca.items.get(idx)) is not None:
print('litemc', elem, itemc)
remove_empty_lines(elem)
remove_empty_lines(data)
which gives:
itemc in production [None, [CommentToken('\n\n', line: 22, col: 0)], None, None]
litemc John Carpenter [CommentToken('# to be updated\n\n', line: 17, col: 23), None, None, None]
litemc Frances Case [CommentToken('\n\n', line: 11, col: 8), None, None, None]
itemc uat [None, None, CommentToken('\n# further environmental details to be added\n', line: 35, col: 0), None]
So you will need to inspect those items and update the CommentToken
. E.g. by using
print(dir(data['governance']['business_owner']['am'].ca.items[0][0]))
print(data['governance']['business_owner']['am'].ca.items[0][0].value)
you'll see that the the .value
attribute contains the actual comment, that
you can strip of spurious newlines.
Once that is done, walk over both template and data once more, check the template
for comments, and insert/update the example. Make sure to create new CommentTokens
do not copy them from the template. Examples for that you can find here