textocrpaddleocr

Paddle OCR Issue when passing pdf file for text detection


Hi i am facing issue when passing pdf file to paddleocr

My code is:

!paddleocr --image_dir /content/SER-1678793239.pdf --use_angle_cls true --use_gpu false

Issue i am facing is:

AttributeError: 'Document' object has no attribute 'pageCount'

Although it works fine for the image files

I Tried different things changing pdf file name etc and number of pages nothing worked


Solution

  • I Solved the issue by uninstalling the pymupdf library (previously installed with paddleocr automatically) the below command

    !pip uinstall pymupdf
    

    Then installed specific version of pymupdf==1.19.0 and issue resolved successfully

    !pip install --ignore-installed pymupdf==1.19.0
    

    Now it's working fine!

    Note: ! sign in front of commands tells the notebook it's a command (not a simple code) so if you are running code outside of the notebook you need to remove ! from the base.