pythondecoratorpython-decoratorsdiskcache

How to inform user that cache is being used?


I am using the python library diskcache and its decorater @cache.memoize to cache calls to my couchdb database. Works fine. However, I would like to print to the user whether the data is returned from the database or from the cache.

I don't even know how to approach this problem.

My code so far:

import couchdb
from diskcache import Cache

cache = Cache("couch_cache")


@cache.memoize()
def fetch_doc(url: str, database: str, doc_id: str) -> dict:

    server = couchdb.Server(url=url)
    db = server[database]

    return dict(db[doc_id])

Solution

  • Here's a way but I don't really recommend it because (1) it adds an extra operation of checking the cache manually yourself, and (2) it probably duplicates what the library is already doing internally. I don't have proper checking for any performance impact since I don't have a production data/env with varied doc_ids, but as martineau's comment says, it could slow things down because of an extra lookup operation.

    But here it goes.

    The diskcache.Cache object "supports a familiar Python mapping interface" (like dicts). You can then manually check for yourself if a given key is already present in the cache, using the same key automatically generated based on the arguments to the memoize-d function:

    An additional __cache_key__ attribute can be used to generate the cache key used for the given arguments.

    >>> key = fibonacci.__cache_key__(100)  
    >>> print(cache[key])  
    >>> 354224848179261915075    
    

    So, you can wrap your fetch_doc function into another function, that checks if a cache key based on the url, database, and doc_id arguments exists, prints the result to the user, all before calling the actual fetch_doc function:

    import couchdb
    from diskcache import Cache
    
    cache = Cache("couch_cache")
    
    @cache.memoize()
    def fetch_doc(url: str, database: str, doc_id: str) -> dict:
        server = couchdb.Server(url=url)
        db = server[database]
        return dict(db[doc_id])
    
    def fetch_doc_with_logging(url: str, database: str, doc_id: str):
        # Generate the key
        key = fetch_doc.__cache_key__(url, database, doc_id)
    
        # Print out whether getting from cache or not
        if key in cache:
            print(f'Getting {doc_id} from cache!')
        else:
            print(f'Getting {doc_id} from DB!')
    
        # Call the actual memoize-d function
        return fetch_doc(url, database, doc_id)
    

    When testing that out with:

    url = 'https://your.couchdb.instance'
    database = 'test'
    doc_id = 'c97bbe3127fb6b89779c86da7b000885'
    
    cache.stats(enable=True, reset=True)
    for _ in range(5):
        fetch_doc_with_logging(url, database, doc_id)
    print(f'(hits, misses) = {cache.stats()}')
    
    # Only for testing, so 1st call will always miss and will get from DB
    cache.clear()
    

    It outputs:

    $ python test.py 
    Getting c97bbe3127fb6b89779c86da7b000885 from DB!
    Getting c97bbe3127fb6b89779c86da7b000885 from cache!
    Getting c97bbe3127fb6b89779c86da7b000885 from cache!
    Getting c97bbe3127fb6b89779c86da7b000885 from cache!
    Getting c97bbe3127fb6b89779c86da7b000885 from cache!
    (hits, misses) = (4, 1)
    

    You can turn that wrapper function into a decorator:

    def log_if_cache_or_not(memoized_func):
        def _wrap(*args):
            key = memoized_func.__cache_key__(*args)
            if key in cache:
                print(f'Getting {doc_id} from cache!')
            else:
                print(f'Getting {doc_id} from DB!')
            return memoized_func(*args)
    
        return _wrap
    
    @log_if_cache_or_not
    @cache.memoize()
    def fetch_doc(url: str, database: str, doc_id: str) -> dict:
        server = couchdb.Server(url=url)
        db = server[database]
        return dict(db[doc_id])
    
    for _ in range(5):
        fetch_doc(url, database, doc_id)
    

    Or as suggested in the comments, combine it into 1 new decorator:

    def memoize_with_logging(func):
        memoized_func = cache.memoize()(func)
    
        def _wrap(*args):
            key = memoized_func.__cache_key__(*args)
            if key in cache:
                print(f'Getting {doc_id} from cache!')
            else:
                print(f'Getting {doc_id} from DB!')
            return memoized_func(*args)
    
        return _wrap
    
    @memoize_with_logging
    def fetch_doc(url: str, database: str, doc_id: str) -> dict:
        server = couchdb.Server(url=url)
        db = server[database]
        return dict(db[doc_id])
    
    for _ in range(5):
        fetch_doc(url, database, doc_id)
    

    Some quick testing:

    In [9]: %timeit for _ in range(100000): fetch_doc(url, database, doc_id)
    13.7 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    In [10]: %timeit for _ in range(100000): fetch_doc_with_logging(url, database, doc_id)
    21.2 s ± 637 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

    (It would be probably better if the doc_id is varied randomly in calls)

    Again, as I mentioned at the start, caching and memoize-ing the function call is supposed to speed-up that function. This answer adds additional operations of cache lookup and printing/logging whether or not you are fetching from DB or from cache, and it could impact the performance of that function call. Test appropriately.