I'm working on an encryption/decryption program, and I got it working on text files; however, I can not open any other formats. For example, if I do:
a_file = open('C:\Images\image.png', 'r', encoding='utf-8')
for a_line in a_file:
print(a_line)
I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\WinPython-64bit-3.4.3.4\python-3.4.3.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 685, in runfile
execfile(filename, namespace)
File "C:\WinPython-64bit-3.4.3.4\python-3.4.3.amd64\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 85, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)
File "C:/Comp_Sci/Coding/line_read_test.py", line 2, in <module>
for a_line in a_file:
File "C:\WinPython-64bit-3.4.3.4\python-3.4.3.amd64\lib\codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
What am I doing terribly wrong?
Short version: You're opening binary files in text mode. Use 'rb'
instead of 'r'
(and drop the encoding
parameter) and you'll be doing it right.
Long version: Python 3 makes a very strict distinction between bytestrings and Unicode strings. The str
type contains only Unicode strings; each character of a str
is a single Unicode codepoint. The bytes
type, on the other hand, represents a series of 8-bit values that do not necessarily correspond to text. E.g., a .PNG file should be loaded as a bytes
object, not as a str
object. By passing the encoding="utf-8"
parameter to open()
, you're telling Python that your file contains only valid UTF-8 text, which a .PNG obviously does not. Instead, you should be opening the file as a binary file with 'rb'
and not using any encoding. Then you'll get bytes
objects rather than str
objects when you read the file, and you'll need to treat them differently.
I see that @ignacio-vazquez-abrams has already posted good sample code while I've been typing this answer, so I won't duplicate his efforts. His code is correct: use it and you'll be fine.