I have created 2 YAML files that look like the following:
file1.yaml
:
variables:
host1: &host1 'https://example.com'
file2.yaml
:
app:
host: *host1
As you can see I want to use the YAML anchor &host1
in a different YAML file. Now I know that YAML does not support this but it is possible using a Python script and ruamel.yaml
.
import ruamel.yaml
def load_and_merge_yaml(base_file, target_file):
yaml = ruamel.yaml.YAML()
# Load the base YAML file (contains anchors)
with open(base_file, 'r') as base_file_obj:
base_data = yaml.load(base_file_obj)
# Load the target YAML file (we want to reuse anchors here)
with open(target_file, 'r') as target_file_obj:
target_data = yaml.load(target_file_obj)
# Manually merge the base data into the target data
# We will handle this by iterating through the base data and adding/replacing it in the target.
# If a key exists in the base data, we will add it to the target if it's missing
def merge_dicts(base, target):
for key, value in base.items():
if isinstance(value, dict):
# If the value is a dictionary, we recursively merge
if key not in target:
target[key] = {} # Create the key if it's missing
merge_dicts(value, target[key]) # Recursively merge nested dictionaries
else:
# If the value is not a dictionary, just assign it to the target
if key not in target:
target[key] = value # Add if missing
# Perform the merge
merge_dicts(base_data, target_data)
# Save the merged result to a new file
with open('merged_output.yaml', 'w') as output_file:
yaml.dump(target_data, output_file)
return target_data
# Example usage
base_file = 'file1.yaml' # The file containing the anchors
target_file = 'file2.yaml' # The file where we want to apply anchors
merged_data = load_and_merge_yaml(base_file, target_file)
# Print the merged data for inspection
print(ruamel.yaml.dump(merged_data, default_flow_style=False))
And people tell me this should work but unfortunately I keep getting this error:
ruamel.yaml.composer.ComposerError: found undefined alias 'host1'
in "file2.yaml", line 2, column 9
Does anyone know what I'm doing wrong? Below is the output I would like.
app:
host: 'https://example.com'
What you are doing wrong is listen to people that have no clue. This will not work, not even if the file2.yaml
would not immediately fail to load. This looks to me like a solution a generative AI would provide.
Within the base_data
instance you can find the anchor information attached to the anchored object (using the .anchor
property). You can also find this anchor information by inspecting the yaml
object after loading (IIRC it is the .anchors
attribute on yaml.composer
, which is only "emptied" when a new document is found, not at the end of a document).
That information doesn't bring you an easy solution. This is because you would have to pre-populate the .anchors
attribute of the Composer
instance your yaml
is using. This requires subclassing the RoundTripComposer
, with a method compose_document
that looks like:
def compose_document(self: Any) -> Any:
self.parser.get_event()
node = self.compose_node(None, None)
self.parser.get_event()
return node
i.e. doesn't initialise self.anchors()
, and then register that version with the yaml instance.
But when you go that far you can register the subclassed Composer
before loading file1.yaml
, the anchors information is (probably) preserved between the two load operations for the different documents (there might be other methods you need to subclass).
It is probably easier to concatenate the first and second file, load that combination, and then remove the keys you found in base_data
from target_data
, that way you follow the YAML spec, which IMO you should.