pythonyamlruamel.yaml

Using Yaml Anchors across different files using python / ruamel.yaml


I have created 2 YAML files that look like the following:

file1.yaml:

variables:
  host1: &host1 'https://example.com'

file2.yaml:

app:
 host: *host1

As you can see I want to use the YAML anchor &host1 in a different YAML file. Now I know that YAML does not support this but it is possible using a Python script and ruamel.yaml.

import ruamel.yaml
 
def load_and_merge_yaml(base_file, target_file):
    yaml = ruamel.yaml.YAML()
 
    # Load the base YAML file (contains anchors)
    with open(base_file, 'r') as base_file_obj:
        base_data = yaml.load(base_file_obj)
 
    # Load the target YAML file (we want to reuse anchors here)
    with open(target_file, 'r') as target_file_obj:
        target_data = yaml.load(target_file_obj)
 
    # Manually merge the base data into the target data
    # We will handle this by iterating through the base data and adding/replacing it in the target.
    # If a key exists in the base data, we will add it to the target if it's missing
    def merge_dicts(base, target):
        for key, value in base.items():
            if isinstance(value, dict):
                # If the value is a dictionary, we recursively merge
                if key not in target:
                    target[key] = {}  # Create the key if it's missing
                merge_dicts(value, target[key])  # Recursively merge nested dictionaries
            else:
                # If the value is not a dictionary, just assign it to the target
                if key not in target:
                    target[key] = value  # Add if missing
 
    # Perform the merge
    merge_dicts(base_data, target_data)
 
    # Save the merged result to a new file
    with open('merged_output.yaml', 'w') as output_file:
        yaml.dump(target_data, output_file)
 
    return target_data
 
# Example usage
base_file = 'file1.yaml'  # The file containing the anchors
target_file = 'file2.yaml'  # The file where we want to apply anchors
merged_data = load_and_merge_yaml(base_file, target_file)
 
# Print the merged data for inspection
print(ruamel.yaml.dump(merged_data, default_flow_style=False))

And people tell me this should work but unfortunately I keep getting this error:

ruamel.yaml.composer.ComposerError: found undefined alias 'host1'
  in "file2.yaml", line 2, column 9

Does anyone know what I'm doing wrong? Below is the output I would like.

app:
 host: 'https://example.com'

Solution

  • What you are doing wrong is listen to people that have no clue. This will not work, not even if the file2.yaml would not immediately fail to load. This looks to me like a solution a generative AI would provide.

    Within the base_data instance you can find the anchor information attached to the anchored object (using the .anchor property). You can also find this anchor information by inspecting the yaml object after loading (IIRC it is the .anchors attribute on yaml.composer, which is only "emptied" when a new document is found, not at the end of a document).

    That information doesn't bring you an easy solution. This is because you would have to pre-populate the .anchors attribute of the Composer instance your yaml is using. This requires subclassing the RoundTripComposer, with a method compose_document that looks like:

        def compose_document(self: Any) -> Any:
            self.parser.get_event()
            node = self.compose_node(None, None)
            self.parser.get_event()
            return node
    

    i.e. doesn't initialise self.anchors(), and then register that version with the yaml instance.

    But when you go that far you can register the subclassed Composer before loading file1.yaml, the anchors information is (probably) preserved between the two load operations for the different documents (there might be other methods you need to subclass).

    It is probably easier to concatenate the first and second file, load that combination, and then remove the keys you found in base_data from target_data, that way you follow the YAML spec, which IMO you should.