I have a PDF file that I convert to jpeg. What I get is a list of images:
[<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1700x2200 at 0x7F0FF46CDC10>,
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1700x2200 at 0x7F0FE6651750>,
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1700x2200 at 0x7F0FE6657450>,
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1700x2200 at 0x7F0FE6657550>,
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1700x2200 at 0x7F0FE6657650>,
<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1700x2200 at 0x7F0FE6657790>]
I need to pass each of them to my AWS Textract pipeline to extract the tables and text.
The issue I can't pass these objects, they aren't files which I can open. Please advise how to read/load such objects?
I believe you want a JPEG-encoded image in a memory buffer:
import io
from PIL import Image
# Encode your PIL Image as a JPEG without writing to disk
buffer = io.BytesIO()
YourImage.save(buffer, format='JPEG', quality=75)
# You probably want
desiredObject = buffer.getbuffer()