pythonpython-imaging-library

Python: Using Pillow to convert any format to png, any way to speed the process up?


I'm currently working on an application that decompiles a PDF file into raw text, images, tables, etc... but I've encountered the problem where a lot of PDFs use a lot of different image formats, so I added a tiny bit of code that uses Pillow to convert from whatever image format was read into png

This is my code at the moment

def convert_to_png(image_bytes, filename):
    try:
        with Image.open(io.BytesIO(image_bytes)) as img:
            png_filename = os.path.splitext(filename)[0] + '.png'
            img = img.convert('RGBA')
            img.save(png_filename, 'PNG')
            print(f"Converted image to PNG: {png_filename}")
        return png_filename
    except Exception as e:
        print(f"Error converting image: {str(e)}")
        return None

but right now this does every operation on my CPU and single threadedly so it is really slow, what are some ways I could speed this code up?

I've tried swapping Pillow out for other libraries but with little to no performance gain and since I'm more familiar with Pillow I'd like to stay either on it or something similar enough


Solution

  • Pillow is a library that is CPU Load Library this means that if you want to manipulate it to make it utilize some of the RAM you would have to convert the image to a numpy array which will convert the image to numbers using the Power of RAM ((this approach is used in Machine Learning as well) I tried this solution on one Image and it made it milesecond faster from 0.58 to 0.45)

    some sources I found(there are many more as well): https://www.javatpoint.com/how-to-convert-images-to-numpy-array and https://stackoverflow.com/questions/56204630/what-is-the-need-of-converting-an-image-into-numpy-array#:~:text=The%20answer%20is%20rather%20simple,uses%20Numpy%20behind%20the%20scenes.

    import numpy as np
    from PIL import Image
    
    img = Image.open("test.jpg")
    
    imageArr = np.array(img)
    
    img = Image.fromarray(imageArr)
    
    img.save("testOutput.png")
    

    and if this wont help you might consider less quality such as saving the img with 85 quality for example:

    img.save(imgName, 'PNG', quality=85)