I am converting Word files to PDF or HTML for preview page in a server queue jobs (run in nodejs), and use latest pandoc (3.2.1). But I think letting nodejs run a shell command should not a safe way.
Is it safe? Or any other better way to do that? (But it's an extra file processing server and no any permission to access other resources, it will be safe even the code is no safe, LOL)
This is some of my code in queue's job:
const fileKey='uploads/xxxx.docx';// the files are store in storage service's uploads dir
let filePath= this.downloadToLocalTmp(filePath)
let outputPath = tmpdir() + path.sep + (fileKey.substring(fileKey.lastIndexOf('/')));
filePath = filePath.replaceAll(' ', '');// by @joesv 's advice, keep safe `; rm -rf /` => ';rm-rf/'
outputPath = outputPath.replaceAll(' ', '');
try {
// using pandoc in next release (1.2.0)
if (isUsePandoc) {
// note: pandoc not support doc
Logger.warn('using pandoc converting');
const command = `pandoc --embed-resources -o ${outputPath} ${filePath}`;
Logger.debug(`exec command: '${command}'`);
const stdout = execSync(command, { timeout: timeout });
Logger.debug(`exec command stdout: ${stdout.toString()}`);
} else {
// note: libreoffice support both doc + docx
Logger.warn('using libreoffice converting');
await libreOfficeFileConverter.convertFile(filePath, tmpdir(), 'pdf');
}
Logger.debug('convertWordFile finished : ' + filePath);
return outputPath;
} catch (error) {
Logger.error('convertWordFile error : ' + error);
throw error;
}
// ... upload to storage service
Calling pandoc usually means that it has access to the file system, which can sometimes be exploited via specially crafted documents. See the "a note on security" section in the pandoc manual.
A more secure method would be to run pandoc as a server (pandoc server
), as this will ensure that pandoc has no access to the file system. Or use the --sandbox
flag, which will give you similar guarantees. In that case using exec
should be fine.