[SOLVED] Render PDF into an image (self-contained, no external command line dependencies) (to use on AWS Lambda)

Render PDF into an image (self-contained, no external command line dependencies) (to use on AWS Lambda)

I need a simple python library to convert PDF to image (render the PDF as is), but after hours of searching, I keep hitting the same wall, I find libraries like pdf2image python library (and many similar ones), which depend on external applications or wrap command-line tools.

Although there are workarounds to allow using these libraries in serverless settings, they all would complicate our deployment and require creating the likes of Execution Environments or extra lambda layers, which will eat up from the small allowed lambda size.

Is there a self-contained, independent mechanism (not dependent on command-line tools) to allow achieving this (seemingly simple) task?

Also, I am wondering, is there a reason (licensing or patents) for the scarcity of tools that deal with PDFs (they are mostly commercial or under strict AGPL licenses)?

Solution

You said "Ended up using pdf2image"

pdf2image (MIT). A python (3.6+) module that wraps pdftoppm (GPL?) and pdftocairo (GPL?) to convert PDF to a PIL Image object.

Generally Poppler (GPL) spinoffs from Open Source Xpdf (GPL) which has

pdftopng:
pdftoppm:
pdfimages:

and a 3rd party pdftotiff