pythonpython-3.xcharacter-encodingpyfpdf

open(..., encoding="") vs str.encode(encoding="")


Question:
What is the difference between open(<name>, "w", encoding=<encoding>) and open(<name>, "wb") + str.encode(<encoding>)? They seem to (sometimes) produce different outputs.

Context:
While using PyFPDF (version 1.7.2), I subclassed the FPDF class, and, among other things, added my own output method (taking pathlib.Path objects). While looking at the source of the original FPDF.output() method, I noticed almost all of it is argument parsing - the only relevant bits are

#Finish document if necessary
if(self.state < 3):
    self.close()
[...]
f=open(name,'wb')
if(not f):
    self.error('Unable to create output file: '+name)
if PY3K:
    # manage binary data as latin1 until PEP461 or similar is implemented
    f.write(self.buffer.encode("latin1"))
else:
    f.write(self.buffer)
f.close()

Seeing that, my own Implementation looked like this:

def write_file(self, file: Path) -> None:
    if self.state < 3:
        # See FPDF.output()
        self.close()
    file.write_text(self.buffer, "latin1", "strict")

This seemed to work - a .pdf file was created at the specified path, and chrome opened it. But it was completely blank, even tho I added Images and Text. After hours of experimenting, I finally found a Version that worked (produced a non empty pdf file):

def write_file(self, file: Path) -> None:
    if self.state < 3:
        # See FPDF.output()
        self.close()
    # using .write_text(self.buffer, "latin1", "strict") DOES NOT WORK AND I DON'T KNOW WHY
    file.write_bytes(self.buffer.encode("latin1", "strict"))

Looking at the pathlib.Path source, it uses io.open for Path.write_text(). As all of this is Python 3.8, io.open and the buildin open() are the same.

Note: FPDF.buffer is of type str, but holds binary data (a pdf file). Probably because the Library was originally written for Python 2.


Solution

  • Aaaand found it: Path.write_bytes() will save the bytes object as is, and str.encoding doesn't touch the line endings.

    Path.write_text() will encode the bytes object just like str.encode(), BUT: because the file is opened in text mode, the line endings will be normalized after encoding - in my case converting \n to \r\n because I'm on Windows. And pdfs have to use \n, on all platforms.