python-3.xpdf2image

cannot identify image file <_io.BytesIO object at 0x7f8bbdc115f0>


Tried to convert the pdf to an image in colab. It was working fine till yesterday but not working today. Not sure what causes the issue.

from pdf2image import convert_from_path
import glob
pdf_dir = glob.glob(r'/content/first_page.pdf')  
img_dir = "/content/" 
for pdf_ in pdf_dir:
   pages = convert_from_path(pdf_, 57)
   pages[0].save('output.'+"jpg", 'JPEG')


Solution

  • Fixed the issue by updating sudo

    !sudo apt-get update
    !apt-get install poppler-utils 
    
    # !apt-get install poppler-utils 
    # pip install pdf2image
    
    
    from pdf2image import convert_from_path
    import glob
    
    pdf_dir = glob.glob(r'/content/first_page.pdf')  #your pdf folder path
    img_dir = "/content/"           #your dest img path
    
    for pdf_ in pdf_dir:
        pages = convert_from_path(pdf_, 500)
        for page in pages:
            page.save(pdf_.split("\\")[-1][:-3]+"jpg", 'JPEG')
    

    Now its the pdf is getting converted to image smooth without any issues