pythonpdfjpegpdf2image

Multiple errors occurring when I try to convert pdf in jpeg


I need to convert .pdf file to .jpeg file to do OCR of the text. I found this code:

from pdf2image import convert_from_path
pages = convert_from_path('img732.pdf', 500)
for page in pages:
  page.save('out.jpg', 'JPEG')

And I got this error:

Traceback (most recent call last):
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\site-package\pdf2image\pdf2image.py", line 441, in pdfinfo_from_path
proc = Popen(command, env=env, stdout=PIPE, stderr=PIPE)
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\subprocess.py", line 1420, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] Impossibile trovare il file specificato

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\david\OneDrive\Desktop\SMEpy\prova!!!.py", line 2, in <module>
pages = convert_from_path('img732.pdf', 500)
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\site-packages\pdf2image\pdf2image.py", line 97, in convert_from_path
page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"]
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\site-packages\pdf2image\pdf2image.py", line 467, in pdfinfo_from_path
raise PDFInfoNotInstalledError(
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

I have the .pdf file in the same directory of .py file. Where's the problem?


Solution

  • I guess this problem is library specific. However you can use this solution for run successfully.

    1. Download poppler tools for windows (I recommend latest version):
      http://blog.alivate.com.au/poppler-windows/
    2. After download extract to poppler folder any path
    3. Add environment variables poppler's "bin" folder:
    4. And restart your python workspace