javasurrogate-pairs

How to Convert UTF-16 Surrogate Decimal to UNICODE in Java


I have some string data like

&#55357 ;&#56842 ;

These are surrogate pairs in UTF 16 in decimal format.

How can I convert them to Unicode Code Points in Java, so that my client can understand the Unicode decimal html entity without the surrogate pair?

Example: &#128522 ; - Get this response for the above string


Solution

  • Assuming you already parsed the string to get the 2 numbers, just create a String from those two char values:

    String s = new String(new char[] { 55357, 56842 });
    System.out.println(s);
    

    Output

    😊
    

    To get the code point of that:

    s.codePointAt(0) // returns 128522
    

    You don't have to create a string though:

    Character.toCodePoint((char) 55357, (char) 56842) // returns 128522