jsonunicodeencodingemojiwebsocket-sharp

Packing an emoji as plain text unicode string php


I have a website and Unity project that communicate with one another through a web server using web sockets. I am encoding/decoding the messages I am sending using json. On the Unity side, I am using Newtonsoft for json and websocketsharp for WebSockets. Messages send fine and everything is working, but now I am trying to implement emojis in Unity to display correctly. I was able to create a sprite sheet of all emojis, create a dictionary with the key's being their Unicode and values being their position in the sprite sheet. The issue is that when I receive an emoji (for example the 🤐emoji Unicode: U+1F910), Unity receives it as "\uD83E\uDD10". Is there a way to send the emoji as a string literal of its Unicode? If not is there a way to parse the c# interpreted Unicode back to the original Unicode? I have found regex which converts more common symbols from the above format back to the corresponding symbol but does not give me back the Unicode as a string. Here is what I am currently using to do that:

var result = Regex.Replace(
            arrivedMessages[0],
                @"\\[Uu]([0-9A-Fa-f]{4})",
                m => char.ToString(
                (char)ushort.Parse(m.Groups[1].Value, NumberStyles.AllowHexSpecifier)));

With the above code, if the user were to send a symbol such as º, the decoded json will read \u00ba, but the above regex will convert it back to º. When I try to send an emoji, such as the 🤐symbol, the json will read "\ud83e\udd10" and the regex result will be blank. Is there an issue with the regex? Or is there a better way to go about doing this? Thanks!

Edit:

To simplify the overall question: Is there a way to convert "\uD83E\uDD10" back to a string literal of the Unicode "U+1F910"


Solution

  • Here is the function I ended up using to convert the surrogate pairs as @Mr Lister pointed out:

            string returnValue = "";
    
            for (var i = 0; i < SurrogatePairString.Length; i += char.IsSurrogatePair(SurrogatePairString, i) ? 2 : 1)
            {
                var codepoint = char.ConvertToUtf32(SurrogatePairString, i);
    
                // keep it uppercase for the regex, then when it is found, .ToLower()
                returnValue = String.Format("U+{0:X4}", codepoint);
            }