javascriptservice-workerworkboxcachestorage

Workbox ExpirationPlugin with maxEntries of 1 seems to not enforce any limit on the number of entries


In a PWA, I have a big data file that periodically gets updated. I figured it'd be nice to keep the latest version cached in my service worker for offline support, but only the latest version. I don't want old versions of the data file hanging around using disk space.

I'm using Workbox (version 6.1.2) so I tried writing this in my service worker:

registerRoute(
    new RegExp("/gen/real-player-data-*"),
    new CacheFirst({
        cacheName: "real-player-data",
        plugins: [
            new ExpirationPlugin({
                maxEntries: 1,
                purgeOnQuotaError: true,
            }),
        ],
    }),
);

For completeness, my full service worker is:

import * as googleAnalytics from "workbox-google-analytics";
import { ExpirationPlugin } from "workbox-expiration";
import {
    cleanupOutdatedCaches,
    createHandlerBoundToURL,
    precacheAndRoute,
} from "workbox-precaching";
import { CacheFirst } from "workbox-strategies";
import { NavigationRoute, registerRoute } from "workbox-routing";

registerRoute(
    new RegExp("/gen/real-player-data-*"),
    new CacheFirst({
        cacheName: "real-player-data",
        plugins: [
            new ExpirationPlugin({
                maxEntries: 1,
                purgeOnQuotaError: true,
            }),
        ],
    }),
);

// Will be filled in by tools/build-sw.js
precacheAndRoute(self.__WB_MANIFEST);

const handler = createHandlerBoundToURL("/index.html");
const navigationRoute = new NavigationRoute(handler, {
    denylist: [
        new RegExp("^/files"),
        new RegExp("^/fonts"),
        new RegExp("^/gen"),
        new RegExp("^/ico"),
        new RegExp("^/img"),
        new RegExp("^/manifest"),
        new RegExp("^/sw.js"),
    ],
});
registerRoute(navigationRoute);

// https://developers.google.com/web/tools/workbox/guides/migrations/migrate-from-v3
cleanupOutdatedCaches();

googleAnalytics.initialize();

My data files are named /gen/real-player-data-HASH.json so I figure this will do what I want - notice when my app requests a new version of my data file, add it to the cache, and remove the old one.

In practice, this seems to only partially work. It does create the cache and store the data there, but old versions of the file never seem to get deleted.

Try it yourself. Going to https://play.basketball-gm.com/new_league/real will install the service worker and request the data file, if you let it fully load. You might need to reload it once to see it show up in the Chrome dev tools. The latest version of the data file is https://play.basketball-gm.com/gen/real-player-data-2a6c8e9b0b.json at the time of writing this question:

Chrome devtools showing one entry in the real-player-data cache

(Side note - I'm not sure why it says Content-Length is 0, you can clearly see the data in the bottom pane.)

Now, with the service worker installed, if you just go to an old version of my data file like https://play.basketball-gm.com/gen/real-player-data-540506bc45.json that should get picked up by the route defined above, and I believe it should result in the previous file being removed from the cache.

It does indeed get picked up, but now there are two entries in the cache, the other one did not get deleted:

Chrome devtools showing 2 entries in the real-player-data cache

And they aren't just empty placeholders, you can view the data in both files in the bottom pane.

Try more and you get more in the list, there seems to be no limit:

https://play.basketball-gm.com/gen/real-player-data-18992d5073.json

https://play.basketball-gm.com/gen/real-player-data-fe8f297ea7.json

https://play.basketball-gm.com/gen/real-player-data-fd28409152.json

Chrome devtools showing 5 entries in the real-player-data cache

I'm using Chrome 89 on Ubuntu.

Any idea what I'm doing wrong? Or is there some better way to achieve my goal?

Next day update

I did a bit of console.log debugging within my service worker. It seems that Workbox is basically working correctly, except this block of code which is what actually deletes old entries from the cache:

    for (const url of urlsExpired) {
      await cache.delete(url, this._matchOptions);
    }

Here's what I edited it to, for debugging:

    console.log('urlsExpired', urlsExpired);
    console.log('keys', await cache.keys());
    for (const url of urlsExpired) {
      console.log('delete', url, this._matchOptions);
      const deleted = await cache.delete(url, this._matchOptions);
      console.log('after delete', url, deleted);
    }
    console.log('keys2', await cache.keys());

And here's the output I see when I do what I wrote above (load the service worker, load the 1st data file, load the 2nd data file, observe this output as it tries and fails to delete the 1st data file from the cache):

enter image description here

So it does identify the old URL it needs to delete from the cache. It does see both the old and new URLs in the cache. But that cache.delete call resolves to false. MDN says:

resolves to true if the cache entry is deleted, or false otherwise

This article says:

If it doesn't find the item, it resolves to false.

So I guess that implies it's not finding the item? But look at the screenshots, the URL matches an entry in the cache. And MDN says the first argument to cache.delete can be a Request object or a URL.

Is this a bug in Chrome? A bug in Workbox? Something else? I'm out of ideas here.


Solution

  • The problem was that the "Vary" header is set in my responses for the data file, which means that ExpirationPlugin won't work unless the ignoreVary option is enabled.