javascriptunicode

How do you convert from emoji to unicode escapes in JS


I want to convert unicode to it's unicode escape.

For example, from the freezing face '🥶' I want to obtain the string '"\\uD83E\\uDD76"'.

They are related in the sense that JSON.parse('"\\uD83E\\uDD76"') === '🥶'

However, JSON.stringify('🥶') === '"🥶"' and JSON.stringify('🥶') !== '"\\uD83E\\uDD76"'.

This string that I want will be written to a file next to the emoji, to show both representations, so I really need the escaped version.


Solution

  • You can split the emoji string on "" to get the code units in an array (split doesn't understand surrogate pairs like the string iterator does, so it splits up the code units), then map them to the escape sequence using charCodeAt(0).toString(16).padStart(4, "0") and prepend the backslash and u, join it all back together with .join(""):

    const emoji = "🥶";
    const escapeText = emoji.split("").map((unit) => "\\u" + unit.charCodeAt(0).toString(16).padStart(4, "0")).join("");
    console.log(`${emoji} => ${escapeText}`);

    Add a toUpperCase if you need the escapes in all caps:

    const emoji = "🥶";
    const escapeText = emoji.split("").map((unit) => "\\u" + unit.charCodeAt(0).toString(16).padStart(4, "0").toUpperCase()).join("");
    console.log(`${emoji} => ${escapeText}`);

    I think your mention of JSON just related to how you'd tried to get these sequences, but IF you're writing out a JSON file for the other system, you should be able to include the emoji character directly in a string in the JSON. JSON is specified to accept the full range of Unicode (see the home page, the standard PDF), and the RFC. (Note that the RFC says that if the JSON is being exchanged between systems that aren't in a closed ecosystem, it must be specifically in UTF-8.)