pythonqpdf

QPDF giving file not found error in python


enter image description hereI am using qpdf to decrpyt pdf files (encrytpted but without password) as pypdf2 decryption doesn't work. It is working on command line but with python giving FileNotFoundError

qpdf --decrypt --replace-input test.pdf # it's working; replacing test.pdf with the absolute path

But with python it doesn't

inp_file = open(self.path, "rb")
inp_pdf = PdfFileReader(inp_file)
if inp_pdf.isEncrypted:
    try:
        inp_pdf.decrypt('')
    except:
        subprocess.run(["qpdf", "--decrypt", "--replace-input", self.path)])

Solution

  • I switched to pikepdf. It is built on top of QPDF; brief description in quotes below. It's very simple to create new pdfs based on exisiting pdfs. Also, it handles decryption on the fly.

    Pikepdf provides a Pythonic wrapper around the C++ PDF content transformation library, QPDF.

    It doesn't implement text extraction from pdfs; I used tika for text extraction.