I am using qpdf to decrpyt pdf files (encrytpted but without password) as pypdf2 decryption doesn't work.
It is working on command line but with python giving FileNotFoundError
qpdf --decrypt --replace-input test.pdf # it's working; replacing test.pdf with the absolute path
But with python it doesn't
inp_file = open(self.path, "rb")
inp_pdf = PdfFileReader(inp_file)
if inp_pdf.isEncrypted:
try:
inp_pdf.decrypt('')
except:
subprocess.run(["qpdf", "--decrypt", "--replace-input", self.path)])
I switched to pikepdf. It is built on top of QPDF; brief description in quotes below. It's very simple to create new pdfs based on exisiting pdfs. Also, it handles decryption on the fly.
Pikepdf provides a Pythonic wrapper around the C++ PDF content transformation library, QPDF.
It doesn't implement text extraction from pdfs; I used tika for text extraction.