pythondictionarymru

Python: Trying to create a dict containing limited MRU entries


I am trying to create a dict which contains only a limited number of MRU entries (for helping in caching the output of a costly C function I call via ctypes). Here is the code:

from collections import OrderedDict

class MRUDict(OrderedDict):

    def __init__(self, capacity = 64):
        super().__init__()
        self.__checkAndSetCapacity(capacity)

    def capacity(self):
        return self.__capacity

    def setCapacity(self, capacity):
        self.__checkAndSetCapacity(capacity)
        for i in range(len(self) - capacity):
            self.__evict() # will execute only if len > capacity

    def __getitem__(self, key):
        value = super().__getitem__(key)
        # if above raises IndexError, next line won't execute
        print("Moving key {} to last i.e. MRU position".format(key))
        super().move_to_end(key)
        return value

    def __setitem__(self, key, value):
        if key in self:
            super().move_to_end(key)
        else: # new key
            if len(self) == self.__capacity:
                self.__evict()
        super().__setitem__(key, value)

    def __evict(self):
        key, value = self.popitem(last = False) # pop first i.e. oldest item
        print("Capacity exceeded. Evicting ({}, {})".format(key, value))

    def __checkAndSetCapacity(self, capacity):
        if not isinstance(capacity, int):
            raise TypeError("Capacity should be an int.")
        if capacity == 0:
            raise ValueError("Capacity should not be zero.")
        self.__capacity = capacity

... and here is the testing code:

def printkeys(d):
    print("Current keys in order:", tuple(d)) # here d means d.keys()
    print()

from mrudict import MRUDict
print("Creating MRUDict with capacity 5.")
d = MRUDict(5)
print("Adding keys 0 to 7 with values:")
for i in range(8): d[i] = i + 0.1
printkeys(d)

print("Calling str on object:")
print(d) # test of default __repr__ (since probably __str__ is the same here)
printkeys(d)

print("Accessing existing key 4:")
print(4, d[4]) # test of __getitem__
printkeys(d)

try:
    print("Accessing non-existing key 20:")
    print(20, d[20]) # test of __getitem__
except:
    print("Caught exception: key does not exist.")
printkeys(d)

print("Updating value of existing key 6:")
d[6] = 6.6 # test of __setitem__ with existing key
printkeys(d)

print("Adding new key, value pair:")
d[10] = 10.1 # test of __setitem__ with non-existing key
printkeys(d)

print("Testing for presence of key 3:")
print(3 in d)
printkeys(d)

print("Trying to loop over the items:")
for k in d: print(k, d[k])
printkeys(d)

print("Trying to loop over the items:")
for k, v in d.items(): print(k, v)
printkeys(d)

Now from the output it seems I am being kind of naive in implementing the __getitem__ function because both __repr__ and for ... in (which, I'm guessing here, call __iter__ and then __getitem__) causes the first item to be moved to the last as MRU, but cannot proceed further because there is no "next" item for the iterator since it now points to the last element. But I am not sure what I can do to fix the situation. Should I reimplement __iter__?

I am not sure how to distinguish between the user's calling __getitem__ and an internal call to the same. Of course, a workaround is to make the user to use a find() method which would do the move-to-end thing, but I'd really like to be able to use the regular syntax d[k].

Please advise on how to fix this. Thanks!


Solution

  • For complex changes of behaviour like these, it pays to study the OrderedDict source code.

    The actual __iter__ method loops directly over the internal structure, the doubly linked list that maintains the item order. It'll never directly use __getitem__, instead just returning the keys from the linked list.

    The actual problem that you are having is that you are directly accessing the items while looping:

    for k in d: print(k, d[k])
    

    There is a d[k] in there; it is that access that moves item 5 from start to end. This updates the linked list, so when asking for the next item the curr.next reference is now the root and iteration stops.

    The work-around would be to not do that. Add a dedicated method to access items without triggering the MRU update. Or you could re-use dict.get() for example:

    >>> for k in d: print(k, d.get(k))
    ... 
    5 5.1
    7 7.1
    4 4.1
    6 6.6
    10 10.1
    

    You will have a problem with the .items() method; OrderedDict reuses collections.abc.MutableMapping's .items() method, which returns a collections.abc.ItemsView() instance; see the collections.abc source code.

    You'll have to replace that behaviour:

    from collections.abc import ItemsView
    
    
    class MRUDictItemsView(ItemsView):
        def __contains__(self, item):
            key, value = item
            v = self._mapping.get(key, object())
            return v == value
    
        def __iter__(self):
            for key in self._mapping:
                yield (key, self._mapping.get(key))
    
    
    class MRUDict(OrderedDict):
        # ...
    
        def items(self):
            return MRUDictItemsView(self)
    

    You'd have to do the same for the .keys() and .values() methods.