I am using ruamel.yaml to parse a complex YAML document where certain tagged nodes require special treatment. I inject my custom parsing logic using add_multi_constructor
, as recommended by the published examples. The problem is that I need to change the injected logic dynamically depending on external states but the decoration methods like add_multi_constructor
modify the global state which introduces unacceptable coupling between logically unrelated instances. Here is the MWE:
import ruamel.yaml
def get_loader(parameter):
def construct_node(constructor: ruamel.yaml.Constructor, tag: str, node: ruamel.yaml.Node):
return parameter(tag.lstrip("!"), str(node.value))
loader = ruamel.yaml.YAML()
loader.constructor.add_multi_constructor("", construct_node)
return loader
foo = get_loader(lambda tag, node: f"foo: {tag}, {node}")
bar = get_loader(lambda tag, node: f"bar: {tag}, {node}")
print(foo.load("!abc 123"), bar.load("!xyz 456"), sep="\n")
Output:
bar: abc, 123
bar: xyz, 456
Expected:
foo: abc, 123
bar: xyz, 456
I made the following workaround where I create new class instances dynamically to break the coupling:
def get_loader(parameter):
def construct_node(constructor: ruamel.yaml.Constructor, tag: str, node: ruamel.yaml.Node):
return parameter(tag.lstrip("!"), str(node.value))
# Create a new class to prevent state sharing through class attributes.
class ConstructorWrapper(ruamel.yaml.constructor.RoundTripConstructor):
pass
loader = ruamel.yaml.YAML()
loader.Constructor = ConstructorWrapper
loader.constructor.add_multi_constructor("", construct_node)
return loader
My questions are:
Am I misusing the library? The global effects are a huge red flag which suggests that I am using the API incorrectly, but the library lacks any API documentation so I am not sure what would be the correct way.
Is it safe in the sense of API breakage? Since there is no documented API for this, I am not sure if this is safe to put into production.
IMO you are not misusing the library, just working around its current shortcomings/incompleteness.
Before ruamel.yaml
got the API with the YAML()
instance, it had the function
based API of PyYAML with a few extensions, and other PyYAML's problems had to be worked around in a
similar unnatural way. E.g. I reverted to having classes whose instances could be called (using
__call__()
) on which methods could then be changed to just have access to
YAML documents version parsed from a document (as ruamel.yaml supports YAML
1.2 and 1.1 and PyYAML only (partially) supports 1.1).
But underneath ruamel.yaml's YAML()
instance not all has improved. The code
inherited from PyYAML stores the information for the various constructors
in the class attributes as lookup tables (on yaml_constructor
resp
yaml_multi_constructor
), and ruamel.yaml still does that (as the full old
PyYAML-escque API is effectively still there, and only with version 0.17 has gotten a future
deprecation warning).
Your approach is in so far interesting in that you do:
loader.constructor.add_multi_constructor("", construct_node)
instead of:
loader.Constructor.add_multi_constructor("", construct_node)
(you probably know that loader.constructor
is a property that instantiates
loader.Constructor
if necessary, but other readers of this answer might not)
or even:
def get_loader(parameter):
def construct_node(constructor: ruamel.yaml.Constructor, tag: str, node: ruamel.yaml.Node):
return parameter(tag.lstrip("!"), str(node.value))
# Create a new class to prevent state sharing through class attributes.
class ConstructorWrapper(ruamel.yaml.constructor.RoundTripConstructor):
pass
ConstructorWrapper.add_multi_constructor("", construct_node)
loader = ruamel.yaml.YAML()
loader.Constructor = ConstructorWrapper
return loader
That your code works, is because constructors are stored on the class attribute as .add_multi_constructor()
is a class method.
So what you do is not entirely safe in the sense of API breakage. ruamel.yaml is not at version
1.0 yet, and (API) changes that potentially break your code could come with any
minor version number change. You should set your version dependencies accordingly for
your production code (e.g. ruamel.yaml<0.18
), and update that minor number only after testing with a ruamel.yaml version with a new minor version number.
It is possible to transparently change the use of the class attributes by updating
the classmethods add_constructor()
and add_multi_constructor()
to "normal"
methods and have the initialisation of the lookup tables done in __init__()
.
Both your examples that call the instance:
loader.constructor.add_multi_constructor("", construct_node)
will get the desired result, but ruamel.yaml's behaviour would not change
when calling add_multi_constructor
on the class using:
loader.Constructor.add_multi_constructor("", construct_node)
However changing classmethods add_constructor()
and add_multi_constructor()
in this way affects all code out there, that happens to provide an instance
instead of the class (and said code being fine with the result).
It is more likely that two new
instance methods will be added either to the Constructor
class and to the YAML()
instance , and that the class method will
be either phased out or changed to check on a class and not an instance being
passed in, after a deprecation period with warnings (as will the global functions add_constructor()
and add_multi_constructor()
inherited from PyYAML).
The main advice, apart from having your production code fixed on the minor
version number, is to make sure your testing code displays
PendingDeprecationWarning
. If you are using pytest
this is the case by
default.
That should give you ample time to adapt your code to what the warning
recommends.
And if ruamel.yaml's author stops being lazy, he might provide some documentation for such API additions/changes.
import ruamel.yaml
import types
import inspect
class MyConstructor(ruamel.yaml.constructor.RoundTripConstructor):
_cls_yaml_constructors = {}
_cls_yaml_multi_constructors = {}
def __init__(self, *args, **kw):
self._yaml_constructors = {
'tag:yaml.org,2002:null': self.__class__.construct_yaml_null,
'tag:yaml.org,2002:bool': self.__class__.construct_yaml_bool,
'tag:yaml.org,2002:int': self.__class__.construct_yaml_int,
'tag:yaml.org,2002:float': self.__class__.construct_yaml_float,
'tag:yaml.org,2002:binary': self.__class__.construct_yaml_binary,
'tag:yaml.org,2002:timestamp': self.__class__.construct_yaml_timestamp,
'tag:yaml.org,2002:omap': self.__class__.construct_yaml_omap,
'tag:yaml.org,2002:pairs': self.__class__.construct_yaml_pairs,
'tag:yaml.org,2002:set': self.__class__.construct_yaml_set,
'tag:yaml.org,2002:str': self.__class__.construct_yaml_str,
'tag:yaml.org,2002:seq': self.__class__.construct_yaml_seq,
'tag:yaml.org,2002:map': self.__class__.construct_yaml_map,
None: self.__class__.construct_undefined
}
self._yaml_constructors.update(self._cls_yaml_constructors)
self._yaml_multi_constructors = self._cls_yaml_multi_constructors.copy()
super().__init__(*args, **kw)
def construct_non_recursive_object(self, node, tag=None):
# type: (Any, Optional[str]) -> Any
constructor = None # type: Any
tag_suffix = None
if tag is None:
tag = node.tag
if tag in self._yaml_constructors:
constructor = self._yaml_constructors[tag]
else:
for tag_prefix in self._yaml_multi_constructors:
if tag.startswith(tag_prefix):
tag_suffix = tag[len(tag_prefix) :]
constructor = self._yaml_multi_constructors[tag_prefix]
break
else:
if None in self._yaml_multi_constructors:
tag_suffix = tag
constructor = self._yaml_multi_constructors[None]
elif None in self._yaml_constructors:
constructor = self._yaml_constructors[None]
elif isinstance(node, ScalarNode):
constructor = self.__class__.construct_scalar
elif isinstance(node, SequenceNode):
constructor = self.__class__.construct_sequence
elif isinstance(node, MappingNode):
constructor = self.__class__.construct_mapping
if tag_suffix is None:
data = constructor(self, node)
else:
data = constructor(self, tag_suffix, node)
if isinstance(data, types.GeneratorType):
generator = data
data = next(generator)
if self.deep_construct:
for _dummy in generator:
pass
else:
self.state_generators.append(generator)
return data
def get_args(*args, **kw):
if kw:
raise NotImplementedError('can currently only handle positional arguments')
if len(args) == 2:
return MyConstructor, args[0], args[1]
else:
return args[0], args[1], args[2]
def add_constructor(self, tag, constructor):
self, tag, constructor = MyConstructor.get_args(*args, **kw)
if inspect.isclass(self):
self._cls_yaml_constructors[tag] = constructor
return
self._yaml_constructors[tag] = constructor
def add_multi_constructor(*args, **kw): # self, tag_prefix, multi_constructor):
self, tag_prefix, multi_constructor = MyConstructor.get_args(*args, **kw)
if inspect.isclass(self):
self._cls_yaml_multi_constructors[tag_prefix] = multi_constructor
return
self._yaml_multi_constructors[tag_prefix] = multi_constructor
def get_loader_org(parameter):
def construct_node(constructor: ruamel.yaml.Constructor, tag: str, node: ruamel.yaml.Node):
return parameter(tag.lstrip("!"), str(node.value))
loader = ruamel.yaml.YAML()
loader.Constructor = MyConstructor
loader.constructor.add_multi_constructor("", construct_node)
return loader
foo = get_loader_org(lambda tag, node: f"foo: {tag}, {node}")
bar = get_loader_org(lambda tag, node: f"bar: {tag}, {node}")
print('>org<', foo.load("!abc 123"), bar.load("!xyz 456"), sep="\n")
def get_loader_instance(parameter):
def construct_node(constructor: ruamel.yaml.Constructor, tag: str, node: ruamel.yaml.Node):
return parameter(tag.lstrip("!"), str(node.value))
# Create a new class to prevent state sharing through class attributes.
class ConstructorWrapper(MyConstructor):
pass
loader = ruamel.yaml.YAML()
loader.Constructor = ConstructorWrapper
loader.constructor.add_multi_constructor("", construct_node)
return loader
foo = get_loader_instance(lambda tag, node: f"foo: {tag}, {node}")
bar = get_loader_instance(lambda tag, node: f"bar: {tag}, {node}")
print('>instance<', foo.load("!abc 123"), bar.load("!xyz 456"), sep="\n")
def get_loader_cls(parameter):
def construct_node(constructor: ruamel.yaml.Constructor, tag: str, node: ruamel.yaml.Node):
return parameter(tag.lstrip("!"), str(node.value))
# Create a new class to prevent state sharing through class attributes.
class ConstructorWrapper(MyConstructor):
pass
loader = ruamel.yaml.YAML()
loader.Constructor = ConstructorWrapper
loader.Constructor.add_multi_constructor("", construct_node)
# ^ using the virtual class method
return loader
foo = get_loader_cls(lambda tag, node: f"foo: {tag}, {node}")
bar = get_loader_cls(lambda tag, node: f"bar: {tag}, {node}")
print('>cls<', foo.load("!abc 123"), bar.load("!xyz 456"), sep="\n")
which gives:
>org<
foo: abc, 123
bar: xyz, 456
>instance<
foo: abc, 123
bar: xyz, 456
>cls<
bar: abc, 123
bar: xyz, 456