javascriptnode.jspdfpdftotext

pdfjs-dist importing module error despite rest of project importing appropriately


I am trying to introduce the pdfjs-dist library into my nodejs server. However it's giving an import error enter image description here

Error [ERR_REQUIRE_ESM]: require() of ES Module C:\Users\zjric\auto-filing\node_modules\pdfjs-dist\build\pdf.mjs not supported.
Instead change the require of C:\Users\zjric\auto-filing\node_modules\pdfjs-dist\build\pdf.mjs to a dynamic import() which is available in all CommonJS modules.
    at Object.<anonymous> (C:\Users\zjric\auto-filing\utils\pdf_to_text.js:1:18) {
  code: 'ERR_REQUIRE_ESM'
}

I assume this has to do with my package.json file and the es5/es6 differences shenanigans.

const pdfjsLib = require('pdfjs-dist');

const getTextFromPDF = async (path) =>{
    let doc = await pdfjsLib.getDocument(path).promise;
    let page1 = await doc.getPage(1);
    let content = await page1.getTextContent();
    return content.items.map(function(item){
        return item.str;
    });
}

getTextFromPDF('./demo.pdf').then(data => console.log(data));

module.exports = { getTextFromPDF }

changing my package.json file to "type": "module" isn't realistic as all of my other infrastructure is already formatted in the require('module') and runs perfectly fine. I presume I need to modify the library itself but I'm unaware as to how to manipulate that.


Solution

  • I would usually run into situations like this when working on older project but want to use newer modules with es6 imports/exports.

    What I would do in this situation check for a .default in the requiring file. So in your case it would be

    const pdfjsLib = require('pdfjs-dist').default and if that doesn't work just use the recommended solution what node recommends which is to use dynamic imports.

    Here is an example

    const getTextFromPDF = async(path) => {
      const pdfjs = await import ('pdfjs-dist');
      let doc = await pdfjsLib.getDocument(path).promise;
      let page1 = await doc.getPage(1);
      let content = await page1.getTextContent();
      return content.items.map(function(item) {
        return item.str;
      });
    }
    
    getTextFromPDF('./demo.pdf').then(data => console.log(data));
    
    module.exports = {
      getTextFromPDF
    }