I am using YAML files to allow users to configure a serial workflow to a python program that I am developing:
step1:
method1:
param_x: 44
method2:
param_y: 14
param_t: string
method1:
param_x: 22
step2:
method2:
param_z: 7
method1:
param_x: 44
step3:
method3:
param_a: string
This is then be parsed in python and stored as a dictionary. Now, I know duplicate keys in YAML and python dictionaries are not allowed (why, btw?), but YAML seems perfect for my case given it's clarity and minimalism.
I tried to follow an approach suggested in this question (Getting duplicate keys in YAML using Python). However, in my case, sometimes they are duplicated, and sometimes not and using the proposed construct_yaml_map
, this will either create a dict or a list, which is not what I want. Depending on the node depth I would like to be able to send keys and values on the second level (method1, method2, ...) to a list within a python dictionary, do avoid the duplication issue.
While parsing ruamel.yaml
has no concept of depth beyond being at
the root level of a document (among other things in order to allow for
root level literal scalars to be unindented). Adding such a notion of depth is going to be difficult,
since you have to deal with aliases and possible recursive occurrences
of data, I am also not sure what this would mean in general (although clear enough for your example).
The method creating a mapping in the default, round-trip, loader of ruamel.yaml is rather long. But if you are going to jumble mapping values together, you should not expect to be able to dump them back. let alone preserve comments, aliases, etc. The following assumes you'll be using the simpler safe loader, have aliases and/or merge keys.
import sys
import ruamel.yaml
yaml_str = """\
step1:
method1:
param_x: 44
method2:
param_y: 14
param_t: string
method1:
param_x: 22
step2:
method2:
param_z: 7
method1:
param_x: 44
step3:
method3:
param_a: string
"""
from ruamel.yaml.nodes import *
from ruamel.yaml.compat import Hashable, PY2
class MyConstructor(ruamel.yaml.constructor.SafeConstructor):
def construct_mapping(self, node, deep=False):
if not isinstance(node, MappingNode):
raise ConstructorError(
None, None, 'expected a mapping node, but found %s' % node.id, node.start_mark
)
total_mapping = self.yaml_base_dict_type()
if getattr(node, 'merge', None) is not None:
todo = [(node.merge, False), (node.value, False)]
else:
todo = [(node.value, True)]
for values, check in todo:
mapping = self.yaml_base_dict_type() # type: Dict[Any, Any]
for key_node, value_node in values:
# keys can be list -> deep
key = self.construct_object(key_node, deep=True)
# lists are not hashable, but tuples are
if not isinstance(key, Hashable):
if isinstance(key, list):
key = tuple(key)
if PY2:
try:
hash(key)
except TypeError as exc:
raise ConstructorError(
'while constructing a mapping',
node.start_mark,
'found unacceptable key (%s)' % exc,
key_node.start_mark,
)
else:
if not isinstance(key, Hashable):
raise ConstructorError(
'while constructing a mapping',
node.start_mark,
'found unhashable key',
key_node.start_mark,
)
value = self.construct_object(value_node, deep=deep)
if key in mapping:
if not isinstance(mapping[key], list):
mapping[key] = [mapping[key]]
mapping[key].append(value)
else:
mapping[key] = value
total_mapping.update(mapping)
return total_mapping
yaml = ruamel.yaml.YAML(typ='safe')
yaml.Constructor = MyConstructor
data = yaml.load(yaml_str)
for k1 in data:
# might need to guard this with a try-except for non-dictionary first-level values
for k2 in data[k1]:
if not isinstance(data[k1][k2], list): # make every second level value a list
data[k1][k2] = [data[k1][k2]]
print(data['step1'])
which gives:
{'method1': [{'param_x': 44}, {'param_x': 22}], 'method2': [{'param_y': 14, 'param_t': 'string'}]}