pythonvisual-studio-codeocrmypdf

Import ocrmypdf in Visual Stdio Code in Python


I would like to import ocrmypdf.

I have installed the package using pip install --upgrade --user ocrmypdf

but as I tried to import in VSC with:

import ocrmypdf

it caught error:

[WinError 2] The system cannot find the file specified
[WinError 2] The system cannot find the file specified
---------------------------------------------------------------------------
MissingDependencyError                    Traceback (most recent call last)
<ipython-input-9-a81f3474d7ad> in <module>
----> 1 import ocrmypdf

~\AppData\Roaming\Python\Python38\site-packages\ocrmypdf\__init__.py in <module>
      8 from pluggy import HookimplMarker as _HookimplMarker
      9 
---> 10 from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
     11 from ocrmypdf._concurrent import Executor
     12 from ocrmypdf._jobcontext import PageContext, PdfContext

~\AppData\Roaming\Python\Python38\site-packages\ocrmypdf\leptonica.py in <module>
     42 _libpath = find_library(libname)
     43 if not _libpath:
---> 44     raise MissingDependencyError(
     45         """
     46         ---------------------------------------------------------------------

MissingDependencyError: 
        ---------------------------------------------------------------------
        This error normally occurs when ocrmypdf can't find the Leptonica
        library, which is usually installed with Tesseract OCR. It could be that
        Tesseract is not installed properly, we can't find the installation
        on your system PATH environment variable.

        The library we are looking for is usually called:
            liblept-5.dll   (Windows)
            liblept*.dylib  (macOS)
            liblept*.so     (Linux/BSD)

        Please review our installation procedures to find a solution:
            https://ocrmypdf.readthedocs.io/en/latest/installation.html
        ---------------------------------------------------------------------
        

Solution

  • The error log states that there is some missing dependency, which means that some module that is being used by ocrmypdf is missing. Most probably, it needs teserract OCR. Try installing that and it may work. Even the documentation of the module states that tesseract is required for the module to work properly.

    requirements for ocrmypdf