pythonmemoization

memoize to disk - Python - persistent memoization


Is there a way to memoize the output of a function to disk?

I have a function

def getHtmlOfUrl(url):
    ... # expensive computation

and would like to do something like:

def getHtmlMemoized(url) = memoizeToFile(getHtmlOfUrl, "file.dat")

and then call getHtmlMemoized(url), to do the expensive computation only once for each url.


Solution

  • Python offers a very elegant way to do this - decorators. Basically, a decorator is a function that wraps another function to provide additional functionality without changing the function source code. Your decorator can be written like this:

    import json
    
    def persist_to_file(file_name):
    
        def decorator(original_func):
    
            try:
                cache = json.load(open(file_name, 'r'))
            except (IOError, ValueError):
                cache = {}
    
            def new_func(param):
                if param not in cache:
                    cache[param] = original_func(param)
                    json.dump(cache, open(file_name, 'w'))
                return cache[param]
    
            return new_func
    
        return decorator
    

    Once you've got that, 'decorate' the function using @-syntax and you're ready.

    @persist_to_file('cache.dat')
    def html_of_url(url):
        your function code...
    

    Note that this decorator is intentionally simplified and may not work for every situation, for example, when the source function accepts or returns data that cannot be json-serialized.

    More on decorators: How to make a chain of function decorators?

    And here's how to make the decorator save the cache just once, at exit time:

    import json, atexit
    
    def persist_to_file(file_name):
    
        try:
            cache = json.load(open(file_name, 'r'))
        except (IOError, ValueError):
            cache = {}
    
        atexit.register(lambda: json.dump(cache, open(file_name, 'w')))
    
        def decorator(func):
            def new_func(param):
                if param not in cache:
                    cache[param] = func(param)
                return cache[param]
            return new_func
    
        return decorator