Question:
What is the difference between open(<name>, "w", encoding=<encoding>)
and open(<name>, "wb") + str.encode(<encoding>)
? They seem to (sometimes) produce different outputs.
Context:
While using PyFPDF (version 1.7.2), I subclassed the FPDF
class, and, among other things, added my own output method (taking pathlib.Path
objects). While looking at the source of the original FPDF.output()
method, I noticed almost all of it is argument parsing - the only relevant bits are
#Finish document if necessary
if(self.state < 3):
self.close()
[...]
f=open(name,'wb')
if(not f):
self.error('Unable to create output file: '+name)
if PY3K:
# manage binary data as latin1 until PEP461 or similar is implemented
f.write(self.buffer.encode("latin1"))
else:
f.write(self.buffer)
f.close()
Seeing that, my own Implementation looked like this:
def write_file(self, file: Path) -> None:
if self.state < 3:
# See FPDF.output()
self.close()
file.write_text(self.buffer, "latin1", "strict")
This seemed to work - a .pdf file was created at the specified path, and chrome opened it. But it was completely blank, even tho I added Images and Text. After hours of experimenting, I finally found a Version that worked (produced a non empty pdf file):
def write_file(self, file: Path) -> None:
if self.state < 3:
# See FPDF.output()
self.close()
# using .write_text(self.buffer, "latin1", "strict") DOES NOT WORK AND I DON'T KNOW WHY
file.write_bytes(self.buffer.encode("latin1", "strict"))
Looking at the pathlib.Path
source, it uses io.open
for Path.write_text()
. As all of this is Python 3.8, io.open
and the buildin open()
are the same.
Note:
FPDF.buffer
is of type str
, but holds binary data (a pdf file). Probably because the Library was originally written for Python 2.
Aaaand found it: Path.write_bytes()
will save the bytes object as is, and str.encoding
doesn't touch the line endings.
Path.write_text()
will encode the bytes object just like str.encode()
, BUT: because the file is opened in text mode, the line endings will be normalized after encoding - in my case converting \n
to \r\n
because I'm on Windows. And pdfs have to use \n
, on all platforms.