unicodeemoji

Why were emojifying of existing characters in Unicode problematic?


Reading the FAQ for Emoji submission to Unicode, I found this question:

Q: Can existing pictographic characters be emojified?

R: No proposals to emojify existing characters are accepted any longer. Such proposals were accepted in the past, but that proved to be problematic for a variety of reasons.

However, I didn't found any explanation to this fact on Internet.


Solution

  • Emoji characters are generally expected to be displayed as colourful pictographs, more akin to embedded images than regular writing. Not only can this create a jarring contrast with the surrounding text, it also makes certain styling options (such as text colour) impossible to apply.

    Normally this wouldn’t be much of a problem because people know which characters are emoji and use them accordingly. However, if a non-emoji character is emojified after the fact, already existing documents making use of that character will suddenly contain emoji where there weren’t meant to be any, potentially breaking the design in a way the author couldn’t have prepared for.

    Unicode does have a solution for this. Every character is categorised as either emoji-default or text-default. If a character has both emoji and non-emoji uses, its default value can then be overridden with a special character called a variation selector, allowing users to pick which presentation they prefer. Naturally, all characters that are not meant to be emoji are text-default; if they become emoji later on they would continue to display as before.

    The trouble is that pretty much nothing actually handles these default values properly. If you enter a character like U+2603 ☃ SNOWMAN in Twitter or Discord or Mastodon, it will be shown as emoji-style even though the Unicode Standard clearly states that it is supposed to be text-style by default (because it predates the inclusion of emoji in Unicode) and only become emoji-style when using a variation selector. The same is true for virtually all text-default emoji characters, including those that are being emojified belatedly, leading to messages retroactively changing appearance. The variation selector mechanism also isn’t universally supported, so even if you went back and manually edited those old entries to include the correct variation selectors, there would still be no guarantee that they’d show up as intended for everyone.

    To minimise the impact of these issues, the Unicode Consortium decided to stop emojifying existing characters and instead give emoji status only to newly encoded characters, even if an otherwise suitable character already exists in the standard. For example, the relatively recent U+1FAAF 🪯 KHANDA represents the exact same symbol as the older U+262C ☬ ADI SHAKTI, only the former is an emoji while the latter is exclusively text-style.

    Note that this decision also applies in reverse: Already existing emoji characters will no longer be used for non-emoji purposes. For example, the character U+1F40D 🐍 SNAKE was originally planned to be used for representing a symbol from the old Sharp MZ character set as proposed in this document, but it was later decided to instead encode an entirely new snake character that isn’t “tainted” by emoji presentation for this purpose.