pythoncachingttl

Where is the bug in this seemingly simple TTL cache access pattern?


Ran into an issue. I will show the snippet using python:

from cachetools import TTLCache

cache = TTLCache(maxsize=SOME_SIZE, ttl=SOME_TTL)

def fetch(key):
  if key not in cache:
    result = database.get_result(key)
    cache[key] = result
  result = cache[key]
  return result

This is a simple access pattern I made where I cache some results in a TTL cache. But there are some edge cases where I get a key error from the cache. Somehow, it expired before then. How can this happen? If it isn't in the cache, I put it in the cache and immediately use it. Otherwise, I fetch from the cache and use it. I'm using a 5 minute TTL as well. Curious where I may be running into some garbage collection or edge cases?

Edit: Improve variable names for question readability.


Solution

  • It's possible that, in the time between "check if item is in cache" and "get the item from the cache", the item may have actually been removed from the cache.

    Without looking at the code for TTLCache, and given Python's normally sequential processing, this would likely happen only if you were either:

    But, in fact, after looking at the code, it uses a variation of the second point above. Whenever an item is cached, it is given an expiry time based on current time plus time to live.

    Then the __contains__ (for in) and __getitem__ (for []) dunder functions both check this to see if the item is expired and should be removed from the cache.

    So, if the current time is before the expiry time when you check that the key exists, but after the expiry time when you attempt retrieval, that could easily account for your occasional edge cases.

    The normal approach is these situations is to "seek forgiveness rather than ask permission". That would entail trying to get the cached entry and, only if that errors, get the real value and re-cache it, something along the lines of:

    def fetch(key):
        try:
            result = cache[key]
        except KeyError:
            result = db.get_value(key)
            cache[key] = result
        return result