pythonlrusingle-dispatch

How to combine @singledispatch and @lru_cache?


I have a Python single-dispatch generic function like this:

@singledispatch
def cluster(documents, n_clusters=8, min_docs=None, depth=2):
  ...

It is overloaded like this:

@cluster.register(QuerySet)
@lru_cache(maxsize=512)
def _(documents, *args, **kwargs):
  ...

The second one basically preprocesses a QuerySet object and calls the generic cluster() function. A QuerySet is a Django object, but that should not play a role here; apart from the fact that it is hashable and thus usable with lru_cache.

The generic function cannot be cached though because it accepts unhashable objects such as lists as arguments. However, the overloading function can be cached because a QuerySet object is hashable. That is why I've added the @lru_cache() annotation.

However, caching does not seem to be applied:

qs: QuerySet = [...]

start = datetime.now(); cluster(Document.objects.all()); print(datetime.now() - start)               
0:00:02.629259

I would expect the same call to take place in an instance, but:

start = datetime.now(); cluster(Document.objects.all()); print(datetime.now() - start)               
0:00:02.468675

This is confirmed by the cache statistics:

cluster.registry[django.db.models.query.QuerySet].cache_info()
CacheInfo(hits=0, misses=2, maxsize=512, currsize=2)

Changing the order of the @lru_cache and the @.register annotations does not seem to make a difference.

This question is similar, but the answer does not fit on the individual function level.

Is it even possible to combine these two annotations on this level? If so, how?


Solution

  • hash(Document.objects.all()) == hash(Document.objects.all()) is not consistent for Django QuerySet.

    The call Document.objects.all() doesn't hit the database until the QuerySet returned is evaluated.

    Pickling is usually used as a precursor to caching

    Django docs.

    Depending on your use case you can try caching the pickle of the QuerySet or its query attribute.

    @cluster.register(bytes)
    @lru_cache(maxsize=512)
    def _(documents, *args, **kwargs):
        documents = pickle.loads(documents)
        ...
    
    cluster(pickle.dumps(Document.objects.all()))
    

    or

    cluster(pickle.dumps(Document.objects.all().query))