Have a nice, tested bit of python PyPDF2 code a .py designed to operate on 'real' OS files. Having debugged it all, I am now trying to incorporate it into a plPython function, replacing files with io.BytesIO() - or whatever mechanism would be the best candidate for seamless drop-in...
The file read/writes will now be to PostgreSQL bytea cols. Documents 'in' have been written with PG copy functions - byte counts match disk sizes; so far so good.
Original code expected files:
# infile = "myInputPdf.pdf"
# outfile = "myOutputPdf.pdf"
# inputStream = open(infile, "rb") # designed to open OS-based file
# --- Instead: 'document_in' loaded from PG bytea col:
inputStream = io.BytesIO(document_in)
# ---
pdf_reader = PdfFileReader(inputStream, strict=False)
# (lots of code in here, seems? to be working)
outputStream = io.BytesIO() # trying it the python3 way!
pdf_writer.write(outputStream)
(I've assumed the objects should be treated as byte objects)
Finally:
plan3 = plpy.prepare("UPDATE documents SET document_out=$2 WHERE name=$1", ["varchar"]["varchar"])
ERROR: TypeError: list indices must be integers, not str
(PostgreSQL 11.1, if it matters)
Have done similar things in the past using mkstemp techniques; trying now to grow up into the bytes world!
The second argument in plpy.prepare()
is a list. The column type is bytea
, not varchar
. And you should use bytes
(not a file object) to update the column:
plan3 = plpy.prepare("UPDATE documents SET document_out=$2 WHERE name=$1", ["varchar", "bytea"])
outputStream.seek(0)
bytes_out = outputStream.read()
plpy.execute(plan3, ['some name', bytes_out])