reactjspdfpdf-parsing

How to extract content of PDF in React.js?


I am trying to load PDF file of my local storage then extract content in React.js without any backend.

I tried to find similar modules from google, but didn't find proper module yet. There are many node modules for parsing PDFs, and I can extract content of PDF in backend, but I am not sure we can use it in web browsers.


Solution

  • I tried this, and pdfjs-dist was no longer functional. Instead, a better alternative to extract text from a PDF directly within React was react-pdftotext.

    1. Install the library:

    npm install react-pdftotext
    

    2. Import the library:

    import pdfToText from 'react-pdftotext'
    

    3. Create an input field:

    <input type="file" accept="application/pdf" onChange={extractText}/>
    

    4. Prepare a function:

        function extractText(event) {
            const file = event.target.files[0]
            pdfToText(file)
                .then(text => console.log(text))
                .catch(error => console.error("Failed to extract text from pdf"))
        }
    

    Finally, bringing it all together:

    import pdfToText from 'react-pdftotext'
    
    
    function extractText(event) {
        const file = event.target.files[0]
        pdfToText(file)
            .then(text => console.log(text))
            .catch(error => console.error("Failed to extract text from pdf"))
    }
    
    function PDFParserReact() {
    
        return (
            <div className="App">
                <header className="App-header">
                    <input type="file" accept="application/pdf" onChange={extractText}/>
                </header>
            </div>
        );
    }
    export default PDFParserReact;
    

    References: https://devnavigator.com/home/text-extraction-from-pdf-in-react-73a6519d-d8ab-4e52-8819-ff39bbb54f2a