I have that string in my text file: ├░┬č┬Ź┬ć
What is known is that it was emoji or at least some surrogate character/character created by javascript string of length 2 or 4
Because of some reason it end up in that form.
(It was obtained from mysql database which is utf8_general_ci
and by node.js/mysql2/connection with charset latin1_swedish_ci
)
How can I find what emoji it was? Is it possible?
Other examples:
├░┬č┬ĺ┬Ž
├░┬č┬ś┬ł
├░┬č┬ą┬Á
Algorithm written in JS would be best option.
It's double mojibake as shown in the following python
code snippet (sorry, I cannot give Javascript equivalent):
print('🍆 💦 😈 🥵'.
encode('utf-8').decode('latin1'). # 1st mojibake stage
encode('utf-8').decode('cp852') # 2nd mojibake stage
) # ├░┬č┬Ź┬ć ├░┬č┬ĺ┬Ž ├░┬č┬ś┬ł ├░┬č┬ą┬Á
Possible repair (although prevention is better than cure):
print('├░┬č┬Ź┬ć ├░┬č┬ĺ┬Ž ├░┬č┬ś┬ł ├░┬č┬ą┬Á'.
encode('cp852').decode('utf-8'). # fix 2nd mojibake stage
encode('latin1').decode('utf-8') # fix 1st mojibake stage
) # 🍆 💦 😈 🥵
FYI, those emojis are (column CodePoint
contains Unicode (U+hhhh
) and UTF-8 bytes; column Description
contains surrogate pairs in parentheses):
Char CodePoint Description
---- --------- -----------
🍆 {U+1F346, 0xF0,0x9F,0x8D,0x86} AUBERGINE (0xd83c,0xdf46)
💦 {U+1F4A6, 0xF0,0x9F,0x92,0xA6} SPLASHING SWEAT SYMBOL (0xd83d,0xdca6)
😈 {U+1F608, 0xF0,0x9F,0x98,0x88} SMILING FACE WITH HORNS (0xd83d,0xde08)
🥵 {U+1F975, 0xF0,0x9F,0xA5,0xB5} OVERHEATED FACE (0xd83e,0xdd75)