I am trying to create a dict
which contains only a limited number of MRU entries (for helping in caching the output of a costly C function I call via ctypes). Here is the code:
from collections import OrderedDict
class MRUDict(OrderedDict):
def __init__(self, capacity = 64):
super().__init__()
self.__checkAndSetCapacity(capacity)
def capacity(self):
return self.__capacity
def setCapacity(self, capacity):
self.__checkAndSetCapacity(capacity)
for i in range(len(self) - capacity):
self.__evict() # will execute only if len > capacity
def __getitem__(self, key):
value = super().__getitem__(key)
# if above raises IndexError, next line won't execute
print("Moving key {} to last i.e. MRU position".format(key))
super().move_to_end(key)
return value
def __setitem__(self, key, value):
if key in self:
super().move_to_end(key)
else: # new key
if len(self) == self.__capacity:
self.__evict()
super().__setitem__(key, value)
def __evict(self):
key, value = self.popitem(last = False) # pop first i.e. oldest item
print("Capacity exceeded. Evicting ({}, {})".format(key, value))
def __checkAndSetCapacity(self, capacity):
if not isinstance(capacity, int):
raise TypeError("Capacity should be an int.")
if capacity == 0:
raise ValueError("Capacity should not be zero.")
self.__capacity = capacity
... and here is the testing code:
def printkeys(d):
print("Current keys in order:", tuple(d)) # here d means d.keys()
print()
from mrudict import MRUDict
print("Creating MRUDict with capacity 5.")
d = MRUDict(5)
print("Adding keys 0 to 7 with values:")
for i in range(8): d[i] = i + 0.1
printkeys(d)
print("Calling str on object:")
print(d) # test of default __repr__ (since probably __str__ is the same here)
printkeys(d)
print("Accessing existing key 4:")
print(4, d[4]) # test of __getitem__
printkeys(d)
try:
print("Accessing non-existing key 20:")
print(20, d[20]) # test of __getitem__
except:
print("Caught exception: key does not exist.")
printkeys(d)
print("Updating value of existing key 6:")
d[6] = 6.6 # test of __setitem__ with existing key
printkeys(d)
print("Adding new key, value pair:")
d[10] = 10.1 # test of __setitem__ with non-existing key
printkeys(d)
print("Testing for presence of key 3:")
print(3 in d)
printkeys(d)
print("Trying to loop over the items:")
for k in d: print(k, d[k])
printkeys(d)
print("Trying to loop over the items:")
for k, v in d.items(): print(k, v)
printkeys(d)
Now from the output it seems I am being kind of naive in implementing the __getitem__
function because both __repr__
and for ... in
(which, I'm guessing here, call __iter__
and then __getitem__
) causes the first item to be moved to the last as MRU, but cannot proceed further because there is no "next" item for the iterator since it now points to the last element. But I am not sure what I can do to fix the situation. Should I reimplement __iter__
?
I am not sure how to distinguish between the user's calling __getitem__
and an internal call to the same. Of course, a workaround is to make the user to use a find()
method which would do the move-to-end thing, but I'd really like to be able to use the regular syntax d[k]
.
Please advise on how to fix this. Thanks!
For complex changes of behaviour like these, it pays to study the OrderedDict
source code.
The actual __iter__
method loops directly over the internal structure, the doubly linked list that maintains the item order. It'll never directly use __getitem__
, instead just returning the keys from the linked list.
The actual problem that you are having is that you are directly accessing the items while looping:
for k in d: print(k, d[k])
There is a d[k]
in there; it is that access that moves item 5 from start to end. This updates the linked list, so when asking for the next item the curr.next
reference is now the root and iteration stops.
The work-around would be to not do that. Add a dedicated method to access items without triggering the MRU update. Or you could re-use dict.get()
for example:
>>> for k in d: print(k, d.get(k))
...
5 5.1
7 7.1
4 4.1
6 6.6
10 10.1
You will have a problem with the .items()
method; OrderedDict
reuses collections.abc.MutableMapping
's .items()
method, which returns a collections.abc.ItemsView()
instance; see the collections.abc
source code.
You'll have to replace that behaviour:
from collections.abc import ItemsView
class MRUDictItemsView(ItemsView):
def __contains__(self, item):
key, value = item
v = self._mapping.get(key, object())
return v == value
def __iter__(self):
for key in self._mapping:
yield (key, self._mapping.get(key))
class MRUDict(OrderedDict):
# ...
def items(self):
return MRUDictItemsView(self)
You'd have to do the same for the .keys()
and .values()
methods.