Javascript library for Microsoft Office add-ins allows you to get raw content of the DOCX file through getFileAsync()
api, which returns a slice of up to 4MB in one go. You keep calling the function using a sliding window approach till you have read entire content. I need to upload these slices to the server and the join them back to recreate the original DOCX file.
I'm using axios on the client-side and busboy-based express-chunked-file-upload middleware on my node server. As I call getFileAsync
recursively, I get a raw array of bytes that I then convert to a Blob
and append to FormData
before post
ing it to the node server. The entire thing works and I get the slice on the server. However, the chunk that gets written to the disk on the server is much larger than the blob I uploaded, normally of the order of 3 times, so it is obviously not getting what I sent.
My suspicion is that this may have to do with stream encoding, but the node middleware does not expose any options to set encoding.
Here is the current state of code:
Client-side
public sendActiveDocument(uploadAs: string, sliceSize: number): Promise<boolean> {
return new Promise<boolean>((resolve) => {
Office.context.document.getFileAsync(Office.FileType.Compressed,
{ sliceSize: sliceSize },
async (result) => {
if (result.status == Office.AsyncResultStatus.Succeeded) {
// Get the File object from the result.
const myFile = result.value;
const state = {
file: myFile,
filename: uploadAs,
counter: 0,
sliceCount: myFile.sliceCount,
chunkSize: sliceSize
} as getFileState;
console.log("Getting file of " + myFile.size + " bytes");
const hash = makeId(12)
this.getSlice(state, hash).then(resolve(true))
} else {
resolve(false)
}
})
})
}
private async getSlice(state: getFileState, fileHash: string): Promise<boolean> {
const result = await this.getSliceAsyncPromise(state.file, state.counter)
if (result.status == Office.AsyncResultStatus.Succeeded) {
const data = result.value.data;
if (data) {
const formData = new FormData();
formData.append("file", new Blob([data]), state.filename);
const boundary = makeId(12);
const start = state.counter * state.chunkSize
const end = (state.counter + 1) * state.chunkSize
const total = state.file.size
return await Axios.post('/upload', formData, {
headers: {
"Content-Type": `multipart/form-data; boundary=${boundary}`,
"file-chunk-id": fileHash,
"file-chunk-size": state.chunkSize,
"Content-Range": 'bytes ' + start + '-' + end + '/' + total,
},
}).then(async res => {
if (res.status === 200) {
state.counter++;
if (state.counter < state.sliceCount) {
return await this.getSlice(state, fileHash);
}
else {
this.closeFile(state);
return true
}
}
else {
return false
}
}).catch(err => {
console.log(err)
this.closeFile(state)
return false
})
} else {
return false
}
}
else {
console.log(result.status);
return false
}
}
private getSliceAsyncPromise(file: Office.File, sliceNumber: number): Promise<Office.AsyncResult<Office.Slice>> {
return new Promise(function (resolve) {
file.getSliceAsync(sliceNumber, result => resolve(result))
})
}
Server-side
This code is totally from the npm package (link above), so I'm not supposed to change anything in here, but still for reference:
makeMiddleware = () => {
return (req, res, next) => {
const busboy = new Busboy({ headers: req.headers });
busboy.on('file', (fieldName, file, filename, _0, _1) => {
if (this.fileField !== fieldName) { // Current field is not handled.
return next();
}
const chunkSize = req.headers[this.chunkSizeHeader] || 500000; // Default: 500Kb.
const chunkId = req.headers[this.chunkIdHeader] || 'unique-file-id'; // If not specified, will reuse same chunk id.
// NOTE: Using the same chunk id for multiple file uploads in parallel will corrupt the result.
const contentRangeHeader = req.headers['content-range'];
let contentRange;
const errorMessage = util.format(
'Invalid Content-Range header: %s', contentRangeHeader
);
try {
contentRange = parse(contentRangeHeader);
} catch (err) {
return next(new Error(errorMessage));
}
if (!contentRange) {
return next(new Error(errorMessage));
}
const part = contentRange.start / chunkSize;
const partFilename = util.format('%i.part', part);
const tmpDir = util.format('/tmp/%s', chunkId);
this._makeSureDirExists(tmpDir);
const partPath = path.join(tmpDir, partFilename);
const writableStream = fs.createWriteStream(partPath);
file.pipe(writableStream);
file.on('end', () => {
req.filePart = part;
if (this._isLastPart(contentRange)) {
req.isLastPart = true;
this._buildOriginalFile(chunkId, chunkSize, contentRange, filename).then(() => {
next();
}).catch(_ => {
const errorMessage = 'Failed merging parts.';
next(new Error(errorMessage));
});
} else {
req.isLastPart = false;
next();
}
});
});
req.pipe(busboy);
};
}
So it looks like I have found the problem at least. busboy
appears to be writing my array of bytes as text in the output file. I get 80,75,3,4,20,0,6,0,8,0,0,0,33,0,44,25
(as text) when I upload the array of bytes [80,75,3,4,20,0,6,0,8,0,0,0,33,0,44,25]
. Now need to figure out how to force it to write it as a binary stream.
Figured out. Just in case it helps anyone, there was no problem with busboy
or office.js
or axios
. I just had to convert the incoming chunk of data to Uint8Array
before creating a blob from it. So instead of:
formData.append("file", new Blob([data]), state.filename);
like this:
const blob = new Blob([ new Uint8Array(data) ])
formData.append("file", blob, state.filename);
And it worked like a charm.