javascriptsingle-page-applicationtar

How do I extract data from a .tar.gz file (stored in the cloud) from a browser


Problem

I am making a single page application that will be storing its data in one of the major cloud provider's blob storage (for example goggle cloud storage). The data in the cloud storage is a .tar.gz file, and I want to access this from a browser app .

Inside the tar file there will be hundreds of files, and I just want to get one of these files and render it into HTML. I can already load the file, it's just 'how do I get the data out of it'.

Unsurprisingly I am currently using typescript/javascript in the single page application, but that could change if the answer was 'do it this way'.

I'm not worried about browser compatibility (I can specify things like 'only works in this browser), but the browser doesn't have access to a file system and I can't 'shell out' to the operating system

What I have tried

I've had a look for npm packages, and the closest I've come to is https://github.com/npm/node-tar (but that seems to need a file system). I am reasonably confident working with streams, but it doesn't feel (after reviewing the documentation) that zlib will do what I want 'out of the box'. I didn't get a lot of hits from google searching: most just gave the same advice I would: 'shell out to the operating system and have that do it with tar', but I can't follow that advice in the browser

My alternative

If this doesn't work I will put a lambda/function in place to do the de-tarring. I like avoiding 'more moving parts' if I can in a project, but this might be needed.


Solution

  • The result should be achievable by using a combination of pako (a fast zlib JavaScript port) and js-untar:

    <script src="pako.min.js"></script>
    <script src="untar.js"></script>
    <script>
    fetch('test.tar.gz').then(res => res.arrayBuffer()) // Download gzipped tar file and get ArrayBuffer
                        .then(pako.inflate)             // Decompress gzip using pako
                        .then(arr => arr.buffer)        // Get ArrayBuffer from the Uint8Array pako returns
                        .then(untar)                    // Untar
                        .then(files => {                // js-untar returns a list of files (See https://github.com/InvokIT/js-untar#file-object for details)
                            console.log(files);
                        });
    </script>
    

    test.tar.gz was made by running tar -czvf test.tar.gz test on a directory with 3 text files in it, to be able to check that both directories and files show up in the result.