I'm doing something similar to this website with my data. I have the Unicode in the format below, and the code to convert UTF16 into UTF string works.
function decodeFBEmoji (fbString) {
// Convert String to Array of hex codes
const codeArray = (
fbString // starts as '\u00f0\u009f\u0098\u00a2'
.split('')
.map(char => (
char.charCodeAt(0) // convert '\u00f0' to 0xf0
)
); // result is [0xf0, 0x9f, 0x98, 0xa2]
// Convert plain JavaScript array to Uint8Array
const byteArray = Uint8Array.from(codeArray);
// Decode byte array as a UTF-8 string
return new TextDecoder('utf-8').decode(byteArray); // '😢'
I am trying to extract the Unicode from the text string, and then replace it with its decoded Unicode to display as a proper emoji. I tried to use regex to extract the Unicode string, however, it converts to the random garbage character, and regex results out null. How can I replace the given code with its emoji without changing the rest of the text?
function replaceEmoji(text){
let str = "lorem ipsum lorem ipsum \u00e2\u009d\u00a4\u00ef\u00b8\u008f lorem ipsum";
let res = str.match(/[\\]\w+/g);
console.log(str);
console.log(res); //Result is null
}
Console output of the above code
Edit: Regex Pattern I tested
You're trying to decode some UTF8 but you're mixing up JS string escapes and bytes.
When you type \uXXXX
you type an escape for a unicode codepoint (just like \n
is an escape for a newline), so this is true for instance: "\u0041" == "A"
This is the reason your regex cannot match anything, there is actually no backslash \
in the string. Now it's not clear in what form you have your UTF8 coming in, but if it is like you wrote it it is a series of UTF8 bytes which need to be decoded like so:
const utf8 = new Uint8Array(
Array.prototype.map.call(
"lorem ipsum lorem ipsum \u00e2\u009d\u00a4\u00ef\u00b8\u008f lorem ipsum",
c => c.charCodeAt(0)
)
);
console.log(new TextDecoder('utf8').decode(utf8));