I am using RangeDict to make a dictionary that contains ranges. When I use Pickle it is easily written to a file and later read.
import pickle
from rangedict import RangeDict
rngdct = RangeDict()
rngdct[(1, 9)] = \
{"Type": "A", "Series": "1"}
rngdct[(10, 19)] = \
{"Type": "B", "Series": "1"}
with open('rangedict.pickle', 'wb') as f:
pickle.dump(rngdct, f)
However, I want to use YAML (or JSON if YAML won't work...) instead of Pickle since most of the people seem to hate that (and I want human readable files so they make sense to people reading them)
Basically, changing the code to call for yaml and opening the file in 'w'
mode, not in 'wb'
does the trick for the writing side, but when I read the file in another script, I get these errors:
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/yaml/constructor.py", line 129, in construct_mapping
value = self.construct_object(value_node, deep=deep)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/yaml/constructor.py", line 61, in construct_object
"found unconstructable recursive node", node.start_mark)
yaml.constructor.ConstructorError: found unconstructable recursive node
I'm lost here. How can I serialize the rangedict object and read it back in it's original form?
TL;DR; Skip to the bottom of this answer for working code
I am sure some people hate pickle
, it certainly can give some headaches when refactoring code (when the classes of pickled objects move to different files). But the bigger problem is that pickle is insecure, just a YAML is in the way that you used it.
It is for interesting to note that you cannot pickle to the more readable protocol level 0 (the default in Python 3 is protocol version 3) as:
pickle.dump(rngdct, f, protocol=0) will throw:
TypeError: a class that defines slots without defining getstate cannot be pickled
This is because the RangeDict
module/class is a bit minimalistic, which also shows (or rather doesn't) if you try to do:
print(rngdict)
which will just print {}
You probably used the PyYAML dump()
routine (and its corresponding, unsafe, load()
). And although that can dump generic Python classes, you have to realise that that was implemented before or roughly at the same time as Python 3.0. (and Python 3 support was implemented later on). And although there is no reason a YAML parser could dump and load the exact information that pickle
does, it doesn't hook into the pickle
support routines (although it could) and certainly not into the information for the Python 3 specific pickling protocols.
Any way, without a specific representer (and constructor) for RangeDict
objects, using YAML doesn't really make any sense: it makes loading potentially unsafe and your YAML include all of the gory details that make the object efficient. If you do yaml.dump()
:
!!python/object:rangedict.RangeDict
_root: &id001 !!python/object/new:rangedict.Node
state: !!python/tuple
- null
- color: 0
left: null
parent: null
r: !!python/tuple [1, 9]
right: !!python/object/new:rangedict.Node
state: !!python/tuple
- null
- color: 1
left: null
parent: *id001
r: !!python/tuple [10, 19]
right: null
value: {Series: '1', Type: B}
value: {Series: '1', Type: A}
Where IMO a readable representation in YAML would be:
!rangedict
[1, 9]:
Type: A
Series: '1'
[10, 19]:
Type: B
Series: '1'
Because of the sequences used as keys, this cannot be loaded by PyYAML without major modifications to the parser. But fortunately, those modifications have been incorporated in ruamel.yaml
(disclaimer: I am the author of that package), so "all" you need to do is subclass RangeDict
to provide suitable representer and constructor (class) methods:
import io
import ruamel.yaml
from rangedict import RangeDict
class MyRangeDict(RangeDict):
yaml_tag = u'!rangedict'
def _walk(self, cur):
# walk tree left -> parent -> right
if cur.left:
for x in self._walk(cur.left):
yield x
yield cur.r
if cur.right:
for x in self._walk(cur.right):
yield x
@classmethod
def to_yaml(cls, representer, node):
d = ruamel.yaml.comments.CommentedMap()
for x in node._walk(node._root):
d[ruamel.yaml.comments.CommentedKeySeq(x)] = node[x[0]]
return representer.represent_mapping(cls.yaml_tag, d)
@classmethod
def from_yaml(cls, constructor, node):
d = cls()
for x, y in node.value:
x = constructor.construct_object(x, deep=True)
y = constructor.construct_object(y, deep=True)
d[x] = y
return d
rngdct = MyRangeDict()
rngdct[(1, 9)] = \
{"Type": "A", "Series": "1"}
rngdct[(10, 19)] = \
{"Type": "B", "Series": "1"}
yaml = ruamel.yaml.YAML()
yaml.register_class(MyRangeDict) # tell the yaml instance about this class
buf = io.StringIO()
yaml.dump(rngdct, buf)
data = yaml.load(buf.getvalue())
# test for round-trip equivalence:
for x in data._walk(data._root):
for y in range(x[0], x[1]+1):
assert data[y]['Type'] == rngdct[y]['Type']
assert data[y]['Series'] == rngdct[y]['Series']
The buf.getvalue()
is exactly the readable representation shown before.
If you have to deal with dumping RangeDict
itself (i.e. cannot subclass because you use some library that has RangeDict
hardcoded), then you can add the attribute and methods of MyRangeDict
directly to RangeDict
by grafting/monkeypatching.