pythonpython-3.xunicodefile-iodecode

UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>


I'm trying to get a Python 3 program to do some manipulations with a text file filled with information. However, when trying to read the file I get the following error:

Traceback (most recent call last):  
  File "SCRIPT LOCATION", line NUMBER, in <module>  
    text = file.read()
  File "C:\Python31\lib\encodings\cp1252.py", line 23, in decode  
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 2907500: character maps to `<undefined>`  

After reading this Q&A, see How to determine the encoding of text if you need help figuring out the encoding of the file you are trying to open.


Solution

  • The file in question is not using the CP1252 encoding. It's using another encoding. Which one you have to figure out yourself. Common ones are Latin-1 and UTF-8. Since 0x90 doesn't actually mean anything in Latin-1, UTF-8 (where 0x90 is a continuation byte) is more likely.

    You specify the encoding when you open the file:

    file = open(filename, encoding="utf8")