I have a problem about encode and decode in Python. I want to encode a plain text in Vietnamese by my algorithm, but this algorithm can't encode a vietnamese plaintext, so I convert it to UTF-8 by plaintext.encode('utf-8')
, then I convert it from bytes to string (because my algorithm only encodes a string). But my problem is in the decode part, then I decode by my algorithm, I got a UTF-8 string, so I want to decode UTF-8 string to Vietnamese text (mojibake), but I can't use receiveString.decode('utf-8')
because "string has no attribute 'decode'". I know strings can't use this method but how to handle that?
This is the string I receive:
b'v\\xc3\\xb4 \\xc4\\x91\\xe1\\xbb\\x8bch thi\\xc3\\xaan h\\xe1\\xba\\xa1'
That's a UTF-8 string, I want to decode it but
'str' object has no attribute 'decode'
Pretty unclear question. However, the following code snippet could help (inline comments show partial progress report):
receive_string = "b'v\\xc3\\xb4 \\xc4\\x91\\xe1\\xbb\\x8bch thi\\xc3\\xaan h\\xe1\\xba\\xa1'"
vietnamese_txt = (receive_string
.encode() # b"b'v\\xc3\\xb4 \\xc4\\x91\\xe1\\xbb\\x8bch thi\\xc3\\xaan h\\xe1\\xba\\xa1'"
.decode('unicode_escape') # "b'vô Ä\x91á»\x8bch thiên hạ'"
.encode('latin1').decode() # "b'vô địch thiên hạ'"
.lstrip('b').strip("'")) # 'vô địch thiên hạ'
print(vietnamese_txt) # vô địch thiên hạ
vô địch thiên hạ