pythondjangodjango-rest-framework

Django REST Framework cached view returning 2 different payloads


I'm experiencing a strange issue with my Django REST Framework paginated list view, which utilizes a 2 hour cache. If I repeatedly make requests to the view's endpoint, I am sometimes getting Response 1 (x bytes in size) and sometimes getting Response 2 (y bytes in size).

The view code is as follows:

class MyListView(generics.ListAPIView):
    model = MyListModel
    serializer_class = MyListSerializer
    pagination_class = PageNumberPagination
    pagination_class.page_size = 1000

    def get_queryset(self):
        region = self.kwargs.get('region')
        sort_param = '-date_created'
        return MyListModel.objects.filter(region=region).order_by(sort_param)

    @method_decorator(cache_page(2*60*60))
    def get(self, *args, **kwargs):
        return super().get(*args, **kwargs)

I'm not sure if this is relevant, but once a day, I run a cronjob which clears all cached views using the following code:

from django.core.cache import cache

cache.clear()

I have confirmed that the difference in response data from this endpoint is not due to the cache expiring and being replaced with new data. I have also confirmed that the data in the database for MyListModel is not being changed at all. I have also confirmed that the region parameter is consistent between requests.

I'm at a loss for how I could be getting 2 different responses from this endpoint. Cache or no cache, the underlying data is not changing so the response should be consistent. This leads me to believe that there is somehow 2 cached responses being held and sometimes Response 1's cached data is being returned and sometimes Response 2's cached data is being returned. But I still do not know by what mechanism this could possibly occur.


Solution

  • Although strictly speaking possible, Django often is not the process you are running directly to respond to HTTP requests. Typically you put some components in the middle of this. One is for example Gunicorn which typically creates a few processes ("workers") first, and then routes each request to one of these workers, depending on what workers currently are available. This will increase the server's "throughput".

    Depending on how you configure a Django webserver, this thus can result in two or more caches, and other duplication. Clearing one of the caches thus can help one of the Django instances, but not (per se) the other.

    This is typically one of the (many) reasons that a database is used, because work done by one server, should be read by another server. So manipulating the data by one Django server, could generate a lot of trouble if a second Django server has to read that.

    Therefore it might be worth to look how you configured the server to run, and how the caches map to the Django instance(s).