google-chrome-devtoolsservice-workerrecoverycachestorage

How can I export cached files saved in a browser using CacheStorage?


I have a website which uses the CacheStorage API to save various files using a Service Worker. For reasons beyond my control, lots of these files have been lost from the server they get loaded from. However, I have just realised that several hundred of the files have been cached locally in a browser which had accessed the site lots over a period of years (Luckily the site hadn't been clearing up the cache after itself properly). I can preview the files using chrome's dev tools, but when I click "download" it attempts to download a copy from the server (which no longer exists), rather than giving me the locally cached version.

What's the simplest way to do a one-off export of these files (bearing in mind there's a few hundred of them)? I have full access to the computer the browser is running on, and the domain that the site / service worker is running on. It doesn't need to be a pretty solution, as once the files are restored I plan to learn plenty of lessons to prevent something similar happening in future.


Solution

  • Responses added to the CacheStorage API are stored on disk. For example, chrome on Mac OSX stores them in ~/Library/Application Support/Google/Chrome/Default/Service Worker/CacheStorage. Inside this directory, there is a directory for each domain, and within those, separate directories for each particular cache used by that domain. The names of these directories (at both levels) don't appear to be human-readable, so you may need to search the contents to find the specific cache you're looking for.

    Within the directory for each cache, every response is saved in a different file. These are binary files and contain various bits of info, including the URL requested (near the top) and the HTTP response headers (towards the end). Between these, you'll find the body of the HTTP response.

    The exact logic for extracting the bodies and saving them to files usable elsewhere will vary based URL schemas, file formats etc. This bash script worked for me:

    #!/bin/bash
    
    mkdir -p export
    for file in *_0
    do
        output=`LC_ALL=C sed -nE 's%^.*/music/images/artists/542x305/([^\.]*\.jpg).*%\1%p;/jpg/q' $file`
        if [ -z "$output" ]
        then
            echo "file $file missing music URL"
            continue
        fi
    
        if [[ $(LC_ALL=C sed -n '/x-backend-status.*404/,/.*/p' $file) ]]
        then
            echo "$file returned a 404"
            continue
        fi
    
        path="export/$output"
    
        cat $file | LC_ALL=C sed -n '/music\/images\/artists/,$p' | LC_ALL=C sed 's%^.*/music/images/artists/542x305/[^\.]*\.jpg%%g' | LC_ALL=C sed -n '/GET.*$/q;p' > $path
        echo "$file -> $path"
    done