python-3.xtextiso-8859-1

Read .txt with emoji characters in python


I try to read a chat history with smilies in it, but I get the following error:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 38: character maps to

My code looks like this:

file_name = "chat_file.txt"
chat = open(chat_file)
chatText = chat.read() # read data
chat.close()
print(chatText)

I am pretty certain that it's because of elements like: ❤

How can I implement the correct Transformation Format // what is the correct file encoding so python can read these elements?


Solution

  • Never open text files without specifying their encoding.

    Also, use with blocks, these automatically call .close() so you don't have to.

    file_name = "chat_file.txt"
    
    with open(chat_file, encoding="utf8") as chat:
        chat_text = chat.read()
    
    print(chat_text)
    

    iso-8859-1 is a legacy encoding, that means it cannot contain emoji. For emoji the text file has to be Unicode. And the most common encoding for Unicode is UTF-8.