javascriptamazon-s3gzippako

Extracting gzip data in Javascript with Pako - encoding issues


I am trying to run what I expect is a very common use case:

I need to download a gzip file (of complex JSON datasets) from Amazon S3, and decompress(gunzip) it in Javascript. I have everything working correctly except the final 'inflate' step.

I am using Amazon Gateway, and have confirmed that the Gateway is properly transferring the compressed file (used Curl and 7-zip to verify the resulting data is coming out of the API). Unfortunately, when I try to inflate the data in Javascript with Pako, I am getting errors.

Here is my code (note: response.data is the binary data transferred from AWS):

apigClient.dataGet(params, {}, {})
      .then( (response) => {
        console.log(response);  //shows response including header and data

        const result = pako.inflate(new Uint8Array(response.data), { to: 'string' });
        // ERROR HERE: 'buffer error'  

      }).catch ( (itemGetError) => {
        console.log(itemGetError);
      });

Also tried a version to do it splitting the binary data input into an array by adding the following before the inflate:

const charData = response.data.split('').map(function(x){return x.charCodeAt(0); });
const binData = new Uint8Array(charData);
const result = pako.inflate(binData, { to: 'string' });
//ERROR: incorrect header check

I suspect I have some sort of issue with the encoding of the data and I am not getting it into the proper format for Uint8Array to be meaningful.

Can anyone point me in the right direction to get this working?

For clarity:

I have also tried some of the example processing found halfway through this issue: https://github.com/nodeca/pako/issues/15, but that didn't help (I might be misunderstanding the binary format v. array v base64).


Solution

  • I was able to figure out my own problem. It was related to the format of the data being read in by Javascript (either Javascript itself or the Angular HttpClient implementation). I was reading in a "binary" format, but it was not the same as that recognized/used by pako. When I read the data in as base64, and then converted to binary with 'atob', I was able to get it working. Here is what I actually have implemented (starting at fetching from the S3 file storage).

    1) Build AWS API Gateway that will read a previously stored *.gz file from S3.

    At this point you should be able to download a base64 verion of your binary file via the URL (test in browser or with Curl).

    2) I then had the API Gateway generate the SDK and used the respective apiGClient.{get} call.

    3) Within the call, translate the base64->binary->Uint8 and then decompress/inflate it. My code for that:

        apigClient.myDataGet(params, {}, {})
          .then( (response) => {
            // HttpClient result is in response.data
            // convert the incoming base64 -> binary
            const strData = atob(response.data);
    
            // split it into an array rather than a "string"
            const charData = strData.split('').map(function(x){return x.charCodeAt(0); });
    
            // convert to binary
            const binData = new Uint8Array(charData);
    
            // inflate
            const result = pako.inflate(binData, { to: 'string' });
            console.log(result);
          }).catch ( (itemGetError) => {
            console.log(itemGetError);
          });
      }