I'm encountering an issue with preserving YAML anchors for numeric value, particularly with the number 0 all the other numeric value works fine, when using ruamel.yaml
. Here's what's happening:
Context: I'm using ruamel.yaml
to parse and manipulate YAML files in Python. I need to keep anchors for numeric values intact, but here's the problem:
from ruamel.yaml import YAML, ScalarInt, PlainScalarString
# Custom loader to attempt to preserve anchors for numeric values
class CustomLoader(YAML):
def __init__(self):
super().__init__(typ='rt')
self.preserve_quotes = True
self.explicit_start = True
self.default_flow_style = False
def construct_yaml_int(self, node):
value = super().construct_yaml_int(node)
if node.anchor:
# Preserve the anchor for numeric values
if value == 0:
return PlainScalarString("0", anchor=node.anchor.value)
else:
return ScalarInt(value, anchor=node.anchor.value)
return value
yaml = CustomLoader()
# Load the YAML file
with open('current.yaml', 'r') as current_file:
current_data = yaml.load(current_file)
print("Debug: current_data after load:", current_data)
for key, value in current_data.items():
print(f"Debug: Key '{key}', value type: {type(value)}, has anchor: {hasattr(value, 'anchor')}, anchor value: {getattr(value, 'anchor', None)}")
current.yaml
:
person: &person_age 0
person: &person_age 1 # this works
Expected Behavior: The anchor &person_age
should be preserved for the person key with the value 0.
Actual Behavior: The anchor is not preserved; hasattr(value, 'anchor')
returns False
, and the value type is <class 'int'>
rather than ScalarInt
or PlainScalarString
with an anchor.
What I've tried: I've tried to override construct_yaml_int
in a custom loader to manually preserve anchors for integers, but it doesn't seem to work. I've ensured that ruamel.yaml
is configured with typ='rt'
for round-trip preservation. I've experimented with quoting the 0 in the YAML file (person: &person_age "0"
), which does preserve the anchor, but this isn't a feasible solution for my use case where users might not quote their numeric values.
Question: How can I ensure that anchors are preserved for numeric value 0, when using ruamel.yaml
? Is there a way to force ruamel.yaml
to handle anchors for numbers without needing them to be quoted in the source YAML?
Any insights or alternative approaches would be greatly appreciated.
Version- [Python:3.12.5, ruamel.yaml:0.18.6]
On loading your current.yaml
you should get an error because YAML requires unique keys in a mapping.
After fixing that you should get a warning that you redefine the anchor person_age
.
But that is not the cause for 0
to lose its anchor. The cause for that is that constructor for integers
has quite a bit of special code for handling integers starting with the character '0'
(and different code
for handling octals in YAML 1.1 and 1.2),
That code still had a shortcut for the string of characters consisting of only the string "0"
, thereby never reaching code that
properly handled anchored integer scalars (a later addition, not tested with 0).
This will be solved in the next release of ruamel.yaml
, but in the mean time you should be able to do somehting like:
import sys
import ruamel.yaml
yaml_str = """\
person1: &person_age1 0
person2: &person_age2 1 # this works
"""
yaml = ruamel.yaml.YAML()
if ruamel.yaml.version_info < (0, 18, 7):
class MyConstructor(ruamel.yaml.constructor.RoundTripConstructor):
def construct_yaml_int(self, node):
width = None
value_su = self.construct_scalar(node)
try:
sx = value_su.rstrip('_')
underscore = [len(sx) - sx.rindex('_') - 1, False, False]
except ValueError:
underscore = None
except IndexError:
underscore = None
value_s = value_su.replace('_', "")
sign = +1
if value_s[0] == '-':
sign = -1
if value_s[0] in '+-':
value_s = value_s[1:]
if value_s.startswith('0b'):
if self.resolver.processing_version > (1, 1) and value_s[2] == '0':
width = len(value_s[2:])
if underscore is not None:
underscore[1] = value_su[2] == '_'
underscore[2] = len(value_su[2:]) > 1 and value_su[-1] == '_'
return BinaryInt(
sign * int(value_s[2:], 2),
width=width,
underscore=underscore,
anchor=node.anchor,
)
elif value_s.startswith('0x'):
# default to lower-case if no a-fA-F in string
if self.resolver.processing_version > (1, 1) and value_s[2] == '0':
width = len(value_s[2:])
hex_fun = HexInt
for ch in value_s[2:]:
if ch in 'ABCDEF': # first non-digit is capital
hex_fun = HexCapsInt
break
if ch in 'abcdef':
break
if underscore is not None:
underscore[1] = value_su[2] == '_'
underscore[2] = len(value_su[2:]) > 1 and value_su[-1] == '_'
return hex_fun(
sign * int(value_s[2:], 16),
width=width,
underscore=underscore,
anchor=node.anchor,
)
elif value_s.startswith('0o'):
if self.resolver.processing_version > (1, 1) and value_s[2] == '0':
width = len(value_s[2:])
if underscore is not None:
underscore[1] = value_su[2] == '_'
underscore[2] = len(value_su[2:]) > 1 and value_su[-1] == '_'
return OctalInt(
sign * int(value_s[2:], 8),
width=width,
underscore=underscore,
anchor=node.anchor,
)
elif self.resolver.processing_version != (1, 2) and value_s[0] == '0':
return OctalInt(
sign * int(value_s, 8), width=width, underscore=underscore, anchor=node.anchor,
)
elif self.resolver.processing_version != (1, 2) and ':' in value_s:
digits = [int(part) for part in value_s.split(':')]
digits.reverse()
base = 1
value = 0
for digit in digits:
value += digit * base
base *= 60
return sign * value
elif self.resolver.processing_version > (1, 1) and value_s[0] == '0':
# not an octal, an integer with leading zero(s)
if underscore is not None:
# cannot have a leading underscore
underscore[2] = len(value_su) > 1 and value_su[-1] == '_'
return ruamel.yaml.scalarint.ScalarInt(sign * int(value_s), width=len(value_s), underscore=underscore, anchor=node.anchor)
elif underscore:
# cannot have a leading underscore
underscore[2] = len(value_su) > 1 and value_su[-1] == '_'
return ruamel.yaml.scalarint.ScalarInt(
sign * int(value_s), width=None, underscore=underscore, anchor=node.anchor,
)
elif node.anchor:
return ruamel.yaml.scalarint.ScalarInt(sign * int(value_s), width=None, anchor=node.anchor)
else:
return sign * int(value_s)
MyConstructor.add_default_constructor('int')
yaml.Constructor = MyConstructor
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
which gives:
person1: &person_age1 0
person2: &person_age2 1 # this works
From looking at the code I also noticed that anchored sexagesimals (references to which were dropped in the 1.2 spec) lose their anchor, but sexagesimals are not preserved anyway.