javascriptnode.jsgoogle-cloud-platformfile-uploadgoogle-bucket

How To Chunk And Upload A Large File To Google Bucket


I am trying to upload larger files to a google bucket from nodejs. uploading any file under and around the 200MB size mark works perfectly fine. Anything greater than that returns an error

Cannot create a string longer than 0x1fffffe8 characters

By me having a file that big, I have found out that node, does have limitations on how big a blob/file can be. Here are the two code snippets that both throw the same error

This one is with upload streaming

let fileSize = file.size;
      fs.createReadStream(file)
        .pipe(
          upload({
            bucket: BUCKET,
            file: file,
          })
        )
        .on("progress", (progress) => {
          console.log("Progress event:");
          console.log("\t bytes: ", progress.bytesWritten);
          const pct = Math.round((progress.bytesWritten / fileSize) * 100);
          console.log(`\t ${pct}%`);
        })
        .on("finish", (test) => {
          console.log(test);
          console.log("Upload complete!");
          resolve();
        })
        .on("error", (err) => {
          console.error("There was a problem uploading the file");
          reject(err);
        });

and of course just a regular bucket upload

await storage.bucket(BUCKET)
           .upload(file.path, {
             destination: file.name,
            })

I have come to terms that the only solution can be to chunk the file, upload it in chunks, and rejoin the file chunks in the bucket. The problem is that i don't know how to do that and i cant find any documentation on google or GitHub for this clause


Solution

  • To resolve this issue I checked the file size to see if it was larger than 200MB. I chunked it in 200MB chunks (roughly) then uploaded each individually. then joined the files with bucket.combine()

    A very important note is to add the timeout. by default google has a 1 min file upload timeout, I have set it to 60 mins in the below snippet. It is a very hacky approach i must admit

    if (uploadF.size > 209715200) {
        await splitFile
          .splitFileBySize(file.path, "2e8")
          .then(async (names) => {
            console.log(names);
            for (let i = 0; i < names.length; i++) {
              console.log("uploading " + names[i]);
              await storage
                .bucket(BUCKET)
                .upload(names[i], {
                  destination: names[i],
                  timeout: 3600000,
                })
                .catch((err) => {
                  return { status: err };
                });
            }
    
            await bucket
              .combine(names, file.name)
              .catch((err) => {
                return {
                  status: err,
                };
              });
    
            for (let i = 0; i < names.length; i++) {
              console.log("deleting " + names[i]);
              await storage
                .bucket(BUCKET)
                .file(names[i])
                .delete()
                .then(() => {
                  console.log(`Deleted ${name[i]}`);
                })
                .catch((err) => {
                  return { status: err };
                });
            }
            console.log("done");
            return { status: "ok" };
          })