I am using pdf.js in a discovery setting to determine the height and width in pixels of a number of PDF documents.
In the following code snippet, I am pulling a buffer of an 8.5 x 11 Word document printed to PDF. The return I am receiving is the size divided by 4.16666... .
I found that if I pass a scale of 4.166666666666667 I get very close to the actual size of the document, usually within a few millionths of a pixel.
function process(images) {
//All Images in the array have the same path
let pdfdoc = images[0].ImageFilePath
fs.readFile(pdfdoc, (err, imageBuffer) => {
let u = PDFJSLib.getDocument(imageBuffer)
images.forEach(img => {
//if we failed to read the pdf, we need to mark each page for manual review.
if(err) {
console.error(err)
postMessage({height:-1, width:-1, ImageFilePath:img.ImageFilePath, DocId:img.DocId, PageId:img.PageId})
}
else {
u.promise.then(pdf => {
pdf.getPage(img.PageNumber).then(data => {
console.log(data.getViewport(1).width)
console.log(data.getViewport(1).height)
})
});
}
})
})
}
The output I am expecting is the natural width and height to be logged to the console. I need to understand what scale I should be passing in, and what factors determine that scale value. Can I safely pass in 4.166666666666667 and know I'm getting the natural height and width of the page each time?
Other questions I've found relating to this usually have to do with passing the PDF to a viewer -- which I am not doing. Again, my goal is to simply discover the natural height and width of a given PDF page.
Thanks!
On further review of this issue, I determined that the output page sizes in pixels are assuming a DPI of 72. I can divide the values (612, 792) by 72 then multiply them by 300 to get my expected numbers: 2550 and 3300.
let dimensions = data.getViewport(1).viewBox.map(n => n / 72 * 300)
//[ 0, 0, 2550, 3300 ]